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PREFACE 

In 1994 the Clinton Administration was developing policies for the National Information 
Infrastructure (Nil) and seeking to make a business case for investing public money in it. Interests 
throughout the country, including those in the arts and humanities, were approached to help the 
Administration articulate the importance of supporting the information revolution for economic 
development, scientific and scholarly progress, and improvements in the quality of life. The Getty 
Art History Information Program (AHIP), with the American Council of Learned Societies and the 
Coalition for Networked Information, worked with scholars throughout the coi ;.try to write a white 
paper entitled “Humanities and Arts on the Information Highways: A Profile,’ «.ne early drafts of 
which were influential in shaping the Administration’s Information Infrastructure Task Force 
Committee on Applications and Technology report The Information Infrastructure: Reaching Society s 
Goalsy especially the critical chapters on “Arts, Humanities and Culture on the NIL” The final ver- 
sion of the white paper, issued in September 1994, was a major part of the public comment on the 
Administration’s plan and the fullest articulation of the state of humanities computing at that time. 

Staff who prepared these papers became keenly aware of how little was known about the range of 
humanities projects exploiting information technologies and how sorely needed was a research agen- 
da for computing technology focused on the humanities. In future polic\^ discussions, spokesmen for 
the arts and humanities would need to draw more quickly on facts about the current state of imple- 
mentation, point to successes, and explain the specialized research needs posed by their fields. To 
meet these perceived needs, AHIP undertook several projects under the rubric of the Networked 
Access Project in late 1994 and '>95. 

One of these, the Research Agenda Project, was designed to articulate a research agenda for arts and 
humanities computing and achieve consensus among researchers in technology and the humanities 
about the critical research needs in this field. Several dozen leaders in the field were asked to identify 
the important domains in arts and humanities computing research and nominate individuals best sit- 
uated to summarize the state of research in each. From the nominations, staff selected eight critical 
areas identified by large numbers of informants and commissioned eight brief papers. In order to 
allow as many people as possible to have input in shaping the final report, these papers were opened 
for discussion on the Internet in a private list for a monti in early summer of 1995 and for discus- 
sion on an open, loosdy modera;ed list in the fall of 1995. 

This report, therefore, takes into account ideas from the commissioned papers and the open- and 
closed-list discussions ts well as levicws specifically solicited from other individuals identified during 
the process. It does not attempt :o replace the original papers or discussion, but only to synthesize 
their most salient aspects and to idenrify areas for action. The report recognizes that, while resultant 
research would have a predominintly academic focus, such research would have an impact on the 
broades: renge of practitioners and audiences in the arts and humanities. Its purpose is to offer pub- 
lic policy makers and private foi ndarions the information they need to direct support for arts and 
humanities computing into area; most critical for the disciplines. 

After publication and dissemina ion of this report to participants in the discussions, AHIP hopes to 
work with public and private foundations in an effort to increase and coordinate funding in these 
fields. Future reports on die “Sv: le of Networked Cultural Heritage” may be needed to move the 
agenda forward in future years. 
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EXECUTIVE SUMMARY 
The rapid growth of multimedia computing 
and the Internet, and the entrance of the com- 
mercial sector into information and rhe educa- 
tion sector previously dominated by academic 
interests, have raised the stakes for arts and 
humanities computing. In addition, ongoing 
reductions in funding for arts, humanities, and 
educational research (especially from the federal 
government) have made it imperative that dol- 
lars be well spent. In the spring of 1995, the 
Getty Art History Information Program 
(AH IP) asked several dozen experts to help it 
identify the areas of research that they consid- 
ered critical to future progress in arts and 
humanities computing and to nominate special- 
ists who could knowledgeably reflect on these 
domains. Eight individuals were commissioned 
to write papers on these research issues, and two 
electronic discussions, open to the Internet 
community, were conducted to stimulate reac- 
tion to their views. This report uses the com- 
missioned papers and discussions as a basis for 
identifying issues that any research agenda in 
arts and humanities computing should address. 

The papers and discussions exposed four major 
inlrastructural issues and three significant intel- 
lectual problems: 

♦ The arts and humanities lack a venuey such as 
an Annual Review of Arts and Humanities 
Computings a conference^ or an electronic listy 
through which progress on the research agenda 
can he reported and assessed. Support for such 
research forums is essential. 

♦ The arts and humanities have not given rise to 
a field ofreflectjve study, analogous to the his- 
tory, philosophy, and sociology of science, with 
a consequent lack of agreement among its prac- 
titioners on the fiindamental characteristics of 
the fields and the conditions for successful sys- 
tems devebpment and evolution. The study of 
the arts and humanities as fields of human 
endeavor is necessaty to identify the critical 
success criteria for sofiware and systems. 

♦ In the vast array of stancLirds-setthig and 

de facto stand4ndization processes under way in 
the computing industr)», the arts and humani- 
ties need suppo ted spokespersons to articulate 
their constituents requirt tnents. Without such 
spokespersons, they will have no voice in the 
development of sofiware. com mtnication and 
display technologies, and standards governing 
the range from applications to systems. 



♦ The arts and humanities need to expose their 
practitioners, whether academic scholars, 
museum professionals, or librarians, to the dif- 
ference that computer-assisted scholarship and 
teaching could make. Promoting institutional 
and social changes that are essential to create a 
hospitable environment for computer-support- 
ed arts and humanities is thus a tactical 
requirement. 

The intellectual issues needing research are con- 
siderably more complex: 

♦ Representation — The crucial advantages of 
digital libraries lie iu the flexibility of knowl- 
edge representations to support different intel- 
lectual perspectives and functionality. 

However, if they are to create a unified and 
comprehensive library of usefid knowledge, the 
arts and humanities *^ust make sigtiificant 
progress in the next decode in shared methods 
of representation. 

♦ Retrieval — If comprehensive libraries of 
usefid knowledge are created, their use will 
depend on improved means of access. 
Discovering appropriate resources in the net- 
worked environment and retrieving relevant 
information in a usable format will be criti- 
cal. Although the last generation of research in 
these areas has been far from conclusive, it is 
clear that distributed networks place new 
demands on discovery and retrieval. 

♦ Resource persistence — Even if resources of 
great utility can be created and found, schol- 
arship will depend on assurance that scholars 
can cite them at a fixed address, that they will 
look and behave consistently, and that they 
will persist over time. 

1. THE PAPERS 

When dozens of experts were consulted, in the 
spring of 1995» about their views of the most 
important research problems to be resolved for 
progress to be mrdc in arts and humanities 
computing, eight topics arose repeatedly as the 
most significant issues for both the medium- 
and the long term. Commissioned authors were 
then asked to identify the nature of the ques- 
tions raised in each domain, the .state of the art, 
current research of importance, and what future 
research, if funded, would offer the greatest 
benefit to the arts and humanities. Seven of the 
research problem sets can be viesved as occ ir- 
ring in chronological order from the beginning 
of a scholarly or creative process through to the 
8 
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archival life of its products. The eighth paper 
addresses societal mechanisms that affect this 
sequence. Arranged in this order, the eight 
background papers address: 

1. “Tools for Creating and Exploiting 

Content,” by Robert Kolker and Ben 
Shneiderman, University of Maryland 

2. “Knowledge Representation,” by Susan 

Hockey, Center for Electronic Texts in 
the Humanities, Rutgers and Princeton 
Universities 

3. “Resource Search and Discovery,” by Gary 

Marchionini, University of Maryland 

4. “Conversion of Traditional Source 

Materials into Digital Form,” by Anne 
Kenney, Cornell University 

5. “Image and Multimedia Retrieval,” by 

Donna Romer, Eastman Kodak 

6. “Learning and Teaching,” by Janet Murray, 

Massachusetts Institute of Technolog)' 

7. “Archiving and Authenticity,” by David 

Bearman, Archives 8^ Museum 
Informatics 

8. “New Social and Eco nomic Mechanisms 

to Encourage Access,” by John Garrett, 
Corporation for National Research 
Initiatives 

This report summarizes some of the points 
made by both the authors of these background 
papers and the commentators who participated 
in the electronic discussions. It builds on an ear- 
lier paper in which this author posed questions 
about the state of activity in impo*-iant research 
domains in order to stimulate dialogue as part of 
the open listserv discussion jf these issues on the 
Internet during October/ November 1995. The 
online discussions in whicli this author partici- 
pated were intentionally open-ended to stimu- 
late debate. The intention of this paper is to 
bring the discussions to closure', to focus on 
resolvable issues, and to propose a middle- and 
long-term agenda foi further research, Ehc read- 
er will observe that this discussion does not 
attempt to fully address each poifU raised by the 
contributed papers or by the online discussions; 
the fault for any resulting imbalance lies entirely 
with this author. 



T!iis report addresses the research papers in the 
first section, reflecting the judgment of the 
experts consulted, that these represent the most 
important research domains. In the second 
through fourth sections, a series of cross-cutting 
research questions raised by the commissioned 
papers and discussions is addressed separately. 

My intention is not to suggest that the focus of 
research in arts and humanities computing 
should be anything other than the topics 
assigned to the principal authors, but rather to 
explore the issues they addressed from different 
intellectual perspectives. 1 hope this tactic broad- 
ens, deepens, and in some cases recontextualizes 
the points made in the commissioned papers. 

A. Tools for Creating and Exploiting Content 
Robert Kolker and Ben Shneiderman describe 
three strands of current research: the Internet, 
commercially available soft^vare, and tools 
developed for specific research projects or pur- 
poses. While Sha Xin Wei of Stanford 
University correctly suggests that it is more 
appropriate to see the Internet as infrastructure 
than as a tool in itself, network-based applica- 
tions are playing a crucial role in shaping dis- 
course. We know little about how the arts and 
humanities are being influenced by these tools, 
or what other network tools might be desirable. 
Michael Joyce of Vassar College hints at the 
profundity of such influence by the tools for 
multimedia authoring and creation of hyper- 
linked knowledge bases. An unexpected subtext 
of the Kolker and Shneiderman paper is how 
much their examples of “successful” electronic 
support activities invoh'ed, and probably 
depended on, successful human mediation, sug- 
gesting a need to train people to use tools rather 
than basic research into computing capabilities. 
By implication, continued success would entail 
funding more demonstration projects in special- 
ized disciplinary applications and ensuring that 
part of the research plans involve informing 
other practitioners. 

Discussants endorsed the call for research into 
computer interfaces and interfile standards, but 
it was clear from the discussion that there was 
disagreement on whether such research was c.*-u- 
cial in order to make computers easier for 
everyone (including humanists) to use, or 
whether the humanities presented special 
requirements for interface design. Kolker and 
Shneiderman stress the need for future research 
by teams of humanists, specialists in human- 
computer interaction, and computer scientists 
to develop interface standards, software tools. 
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and content for specialized arts and humanities 
users. Most of all they call for support to get 
tools into the hands of students and faculty. 
Since this is an infrastructure problem for 
which upgrading campus-based services is chc 
basic solution, a sound investment would 
appear to be challenge funding with success 
measured according to how much the arts and 
humanities faculty used the installed equipment 
in teaching and research, 

B. Knowledge Representation 
The arts and humanities are self-conscious 
about how they express themselves; indeed, one 
might reasonably say that the arts and humani- 
ties are about the ways in which we express our- 
selves. Given this fact, it should not surprise us 
that knowledge representation was discussed by 
virtually every contributor to the conference 
and in all the commissioned papers. 

The first question, and the fundamental one 
raised by Susan Hockey in her paper, is what to 
represent. Not “Which sources should we cap- 
ture first,” but rather, “What about any source 
do we need to have explicitly represented?” To 
determine this, research is needed into what we 
mean by fidelity of representation in order to 
determine whether fidelity itself is an impossi- 
ible, or even up^esirable, target. Commentators 
noted that we need representations that are 
explicit about their limitations, assumptions, 
and biases; if so, what kinds of annotations are 
required, and how can they be normalized? The 
presence of such self-conscious notation wa.s 
identified as defining the quality of a represen- 
tation, beyond its mere fidelity to the original. 
Since most of the research to date has been on 
text, how can we emphasize a‘I the other 
modalities that convey artistic and humanistic 
knowledge? Research into the features of intel- 
lectual sources that most fully contribute to 
interpretation, understanding, and connections 
would be most useful if those participating 
either agreed to develop prototype applications 
or included in their research design steps to 
bring applications to demonstration. 

Elut even if intellectual perspectives and needs 
of scholarship can define what is to be repre- 
sented. wc St 'll need to pursue research on how 
to represent knowledge effectively, and further, 
how to ensure its future operabilit). The discus- 
sants seemed comfortable with Standard 
Generalized Markup Language (SGML), but it 
is clear that exiensions, such as hy Fimc, 

VRML (VirUiai Reality Markup I >inguagc), and 



other representation languages wall also need to 
be eniployed. Moreover, arts and humanities 
practitioners will need to better understand why 
they should not use HTML (Hypertext Markup 
Language) without guidelines that ensure its con- 
formity w'ith the SGML standard. Standards for 
representing the content of still images, sounds, 
motion images, and three-dimensional graphical 
spaces are still needed. In general, these stan- 
dards will be beneficial to the arts and humani- 
ties if collective agreement is reached on the 
content of the resource annotations (or “meta- 
data”) required for humanistic scholarship. 
Convening groups to reach consensus on the 
descriptive elements that best support humanis- 
tic research will be productive for many years. 

The most vexing issue remains: Why represent 
knowledge? There is no question that we must 
by definition represent it for it to be digitally 
available, or that representations of knowledge 
are designed to serve specific purposes (or, if 
not designed for such purposes, are unkriowing- 
ly valid only for limited purposes), but for what 
purposes do we want to make knowledge repre- 
sentations? In his comments in the discussion, 
Michael Buckland of the University of 
California at Berkeley emphasized the ways in 
w'h'xh representations become derived objects 
in thveir own right and how semiotics research 
can be usefully brought to bear on both ques- 
tions of knovy'iedge representation and questions 
of what knowledge representations mean in 
themselves, as material cultural objects. 
Elsewhere in the discussions the question arose 
of whether we could, or should, engender a 
research tradition that asks what meanings digi- 
tal genres have and for whom and what purpos- 
es they exist. We could take the position of 
technological imperative: that the sources of 
our civilizations self-knowledge will be “re- 
presented” digitally and that we must therefore 
take steps to make the best representations. Or 
wc could t.^y ^:o answer, for different kinds of 
source genres and media, why cciuiin represen- 
tations will be better. A research agenda that 
seeks to answer these questions will, if it pro- 
duces convincing answers, push the process of 
digital representation ahead quickly and need 
not be too costly. 

C.. Resource Search and Discoven 
It u axiomatic that if more and more resources 
are going to be available electronically and arc 
to be of value to the arts and humanities, wc 
will need to better understand the process by 
which researchers locate information of interest 
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to them. In what ways is discovery similar to, 
and how different from, retrieval? We need fur- 
ther research to understand what differentiates 
and what contributes to the effectiveness of 
what Gary Marchionini describes as two very 
different processes. How can the next gener- 
ation of discovery tools better exploit browsing 
and take advantage of prior knowledge through 
guided discovery using authored links and sup- 
port feedback? How can netw^orked data be 
standardized so that its ‘‘handles” will allow 
meaningful discovery at a consistent level of 
detail? What structures and strategies for unique 
and persistent identification of networked 
objects will be required, and how can the sys- 
tems on which electronic objects are created, 
stored, and accessed ensure such identification? 
Discovery research is potentially the most 
important new frontier for inforr ition science, 
and important work can be done at a relatively 
low cost, because the resources being discovered 
are publicly available. 

Retrieval research, on the other hand, has 
long, if checkered, history. What further 
research on retrieval is needed, and how can 
past research that addressed central databases be 
made relevant to the problems of access to dis- 
tributed resources with different functionalities? 
How much additional progress can be made in 
retrieving full text by means of automatic inter- 
mediation such as enhanced fuzzy-logic string 
searching, ranking of results, and using domain- 
based knowledge with user profiles? How can 
retrieval be improved by pre-processing with 
systems tools to index resources automatically, 
merge thesauri effectively, and analyze resources 
to support access to them by people with differ- 
ent levels of knowledge or different languages? 
How can mediated or software-assisted 
exchanges improve retrieval by enabling us to 
use knowledge of feedback to increase precision 
in searches and recall with and beyond brows- 
ing? It is not yet clear how much research in 
artificial intelligence and full-text enhancement 
is specific to the humanities or how much such 
research will contribute in the mid-term future, 
but the long-term promise is great. 

Finally, if we arc truly to be a multimedia digi- 
tal culture, what research do we need to enable 
optical pattern matching, searching for content 
in oral files, finding relevant chunks of multi- 
media, locating experiences rather than data, 
and matching similarities across modalities? 
Here the humanities arc in serious need of 
approaches and tools that will provide for 



approximate retrieval: failure to develop such 
tools means capping the potential of sound and 
image bases and requiring labor-intensive, sin- 
gle-perspective indexing of the digital source 
libraries. Investments in automated supports for 
multimedia indexing and retrieval are crucial, 
although this research may prove expensive. 

D. Conversion of Traditional Source 
Materials into Digital Form 
Most knowledge in the arts and humanities is 
recorded in non-digital formats (most often as 
prnted, typed, or handwritten sources). If, as 
Anne Kenney contends, we need functionally 
robust surrogates, ana we can decide what kinds 
of functionality humanists require in their digital 
representations (as urged under Knowledge 
Representation), then what methods can we 
develop (or better yet, what standard methods 
can we deploy) to acquire that functionality? The 
research agenda for such capture is as long as the 
kinds of existing formats in which our knowl- 
edge is stored and the kinds of surrogates we 
^ed. As methods are suggested and implement- 
ed, how can we evaluate them? What methods 
need to be developed to make conversion cost- 
effective, and what benefits will lead society to 
support creation of surrogates that are richer 
than the originals ivi their yield of knowledge 
representation? Only large-scale, technically 
sophisticated, academically based, multidiscipli- 
nary research will push this agenda forward; 
commercial efforts or individuals are unlikely to 
contribute much to improving high-quality pro- 
duction processes for digital surrogates. 

While it is not, properly speaking, an issue of 
conversion but of delivery, unless research 
addresses and resolves questions of how to man- 
age very large collections of digital materials and 
provide useful access to them, the prospects for 
large-scale conversion are dim. Research into new 
compression techniques will be critical in the 
process. Economics plays a major role, as a busi- 
ness case ultimately must be made for the con- 
version of content. Moreover, research that leads 
to evaluation of post-conversion resources will 
support future conversions and improve methods 
and technologies of capture and delivery. 

Support for study of the economics of conver- 
sion, and for demonstrating scaleablc technolo- 
gies and organization, will be crucial to the larger 
vision of an electronically based, internationally 
accessible, arts and humanities corpus. 

Above all, Anne Kenney calls for quality bench- 
marks (i.c., technical measures that can be 

11 
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applied to digital files), which are crucial if we 
are to exploit the ongoing development of com- 
mercial tools, becau.s * only benchmarks will tell 
us w hether our requirements are being, or ever 
have been, met by off-the-shelf methods. 
Ultimately, these technologies will be accepted 
or rejected by the arts and humanities on the 
basis of display capabilities. But humanists will 
probably not contribute much to this arena of 
research, except in the design of user interfaces 
as discussed earlier by Kolker and Shneiderman. 

E. Image ajtd Multimedia Retrieval 
Marchionini was not alone in addressing image 
retrieval; almost everyone bemoaned the state of 
the art in digital multimedia. Donna Romer 
made clear in her paper that retrieval results are 
always based on data representations, but non- 
textual documents currently defy auto-indexing, 
and we know little about whether, how, and 
under what circumstances text- based approach- 
es enable image-based access. Indeed we know 
little about “likeness” of images, which is the 
fundamental criterion for retrieval. Constructing 
an empirical basis for how best lo represent sets 
of images, in addition to or in place of individ- 
ual images, will also be necessary, since itcm.- 
Icvel control is often missing in these large 
image collections. Much of this research will 
need to begin at the beginning, with documen- 
tation of both the resource sets and the user 
communities. Romer points out that we first 
need to make sizeable, representative, well- 
known image sets and establish the character- 
istics of a variety of “points of view.” Such 
research can be expected to be expensive, time- 
consuming, and slow to produce results. 
Nevertheless, accessible multimedia resources 
are fundamental to the success of a more broad- 
ly based arts and humanities. 

If we are to create large collections of images for 
broad-based access, long-term digital image 
management will require a great deal more 
technical documentation of the images as 
objects with a history of capture techniques. 
Jennifer Trant of AHIPs Imaging Initiative 
notes that research on image documentation 
and image quality are issues of crucial concern 
to the Getty and that these are multidisciplinary 
endeavors, with implications (and therefore 
stakeholders) Seyond the arts and humanities. 
The Getty alone cannot sponsor the required 
research on image quality characteristics and 
methods of documenting the technical charac- 
teristics of digital images (aside from image 
contents or subjects). Research in this area 



needs to be accompanied by standardization 
efforts, education, and implementation strate- 
gies, and by proselytizing to other fields. Again, 
however, such ambitious goals arc crucial to 
making image data usefully available for schol- 
arship and appreciation. 

E Learning and Teaching 
Usually it is our lack of understanding of the 
process of acquiring knowledge, rather chan 
technology, that impedes teaching and learning. 
But, according to Janet Murray, in some areas 
simple technological improvements could help 
in the short term. Because this is one research 
arena in which progress depends critically on 
knowing what is known, a major focus of 
research support should be to inform educators 
of the state of the research, the state of the 
tools, and the state of the resources available to 
them in digital form. Current knowledge in 
these areas is still quite inadequate, so signif- 
icant funding is needed to learn more about 
teaching and learning, test techniques using 
digital rcaources, and develop strategies for eval- 
uating teaching and learning as it takes place 
using digital technologies. These projects are 
relatively large scale, human intensive, cross-dLs- 
ciplinary, often longitudinal, and will require 
considerable support over a number of years. 

Murray emphasized the need for research in 
defining curriculum in the light of what the 
new technologies offer that could not be done 
previously, and the need for collaborative soft- 
ware development efforts to establish compati- 
ble materials and authoring environments 
customized for the needs of humanists. 

G. Archiving and Authenticity 
Implicit in David Bearmans assessment of the 
state of research in archiving is the dramatic 
shift that has taken place in the past five years as 
a result of the proliferation of local and wide 
area networks throughout organizations. These 
have led to the electronic creation and transmis- 
sion of virtually all organizational records. While 
this development affects organizational account- 
ability primarily, the longer-term implications 
for the arts and humanities are that the record of 
our culture, as we arc creating and recording it 
today, is increasingly digital. Because software 
and hardware change so rapidly, all efforts to 
preserve the original bits on the media on which 
they were initially stored arc doomed. Instead, 
ii’searth must focus on preserv'ing context and 
meaning, resident in highci -level represent t- 
tions and functionality, while the practical 
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business of managing archival records across 
time involves copying them onto currently sup- 
ported devices under th'* control of newer soft- 
ware Research into the functional requirements 
for capturing and maintaining the intellectual 
character of records as evidence is quite far 
along, but ongoing support will be needed to 
standardize approaches, implement solutions, 
and train arts and humanities professionals (and 
the organizations in which they work) to 
archive records of contemporary works in ways 
that will be usable in the future. 

H. New Social and Economic Mechanisms to 
Encourage Access 

Perhaps the most difficult task is to lift our- 
selves out of our situation and envision differ- 
ent futures. John Garrett was asked not only to 
do that, but also to identify the research needed 
to invent those futures and report on the state 
of knowledge about hypothetical and futuristic 
social constructs as well as the cultural, intellec- 
tual, political, and economic tools needed to 
construct alternative futures. Neither Garrett, 
nor the discussion of organizational options and 
futures, produced a blueprint for social and 
economic change, but research support directed 
toward ex peri men rs, prototypes, and “re-inven- 
tions” is probably ':he only way that the acade- 
mic community will move from its current 
moorings into new v/aters. While such experi- 
ments need not be costly in their infancy, they 
should be designed to be real players in the real 
world. Foundations will need to develop tactics 
that enable them to fund or loan substantial 
quantities of capital to ensure that start-up ven- 
tures representing new ways of organizing the 
arts and humanities are structured as experi- 
ments, not as permanent, resource-creating 
projects. When such start-up ventures are 
funded, it is also important to hold at least part 
of the funding for research into the before and 
after, and into measurements of individual 
interactions that support fine-tuning or explor- 
ing alternative arrangements. 



Research Agenda Issue: 

A theme running consistently through the 
commissioned papers is that the state of arts 
and humanities computing is diffiadt to 
gauge because it lacks an identity or focus. If 
the arts and humanities had a venue such as 
an Annual Review of Arts and Humanities 
Computing, or if existing mechanisms for 
reporting on hmnanities computing issues 
could be made more respomive to the specific 
needs of humanities disciplines rather than 
to technolo^cal opportunitiest the research 
agenda could be advanced substantially. A 
major focus of any concerted research agenda 
should be to create such a structure. 

II. THE NATURE OF THE ARTS 
AND HUMANITIES 

Since the authors were asked to address research 
issues in humanities and arts computing, it is 
not surprising that many opened their discus- 
sions or prefaced treatment of specific topics by 
reference to the character of the humanities. 

Their papers and the online commentaries made 
clear that further research into how humanists 
work would help define the functional require- 
ments for supporting their activity. It would be 
useful not only to define the past, but also to 
develop baselines that would help us to under- 
stand how scholarship is being transformed by 
computing and digital communications tech- 
nologies. Serious thought should be given as to 
how to foster systemic study of the humanities 
and how to make the results of that research 
both known and useful to those developing sys- 
tems to support the arts and humanities. 

In the absence of a body of research on the 
social and intellectual systems of the arts and 
humanities, authors of the papers and discus- 
sants in the electronic conference cited impres- 
sionistic and undocumented attributes and 
derived from them criteria for evaluating the 
success of computing as a means of supporting 
these disciplines. Among the characteristics of 
the humanities the authors identified as impor- 
tant to shaping the research needs of its disci- 
plines were their presumed diversity, complexity 
of knowledge representation, variability in 
expression, historicity, textuality, cumiilativencss, 
and genre dependence. Often the authors con- 
trasted these, explicitly or implicitly, with pre- 
sumed characteristics of the sciences. But in the 
online discussion, their assumptions about the 
sciences and social sciences were frequently chal- 
lenged; although these assertions were also made 

13 



14 



RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE 



o 

ERIC 



without reference to a body of research literature 
that would have supported the debate, such a 
literature does exist in the history, philosophy, 
and sociology of science. As Leonard Will of 
Information Management Consultants in the 
UK put it, many were “struck by the absence of 
data on what humanities scholars actually do,'* 
despite the self-evident necessity of such research 
in furtheiing the agenda of humanities comput- 
ing. He went on to suggest that research on this 
asp'jct of the problem would also begin ro 
re'iolve the difficult questions of what benefits 
v’ould be obtained from different kinds of inter- 
ventions and implementations and by progress 
in different sub-areas of research. 

A. Disciplinary' Diversity 

One way of thinking about the implications of 
diversity for forr'*ulating the research agenda is 
to see it as a reflection of material conditions 
and an impediment to concerted action. In- 
deed, the way in which it was raised as an issue 
by Kolker and Shneiderman, who spoke of the 
“states of the art” within and between disci- 
plines, the disparity of equipment and access 
(mostly less than ideal) in different institutions, 
and the absence of humanities researchers 
among those engaged full-time in humanities- 
oriented computing research make it appear 
that diversity is a social and institutional charac- 
teristic of the arts and humanities. 

However, there may be more fundamental 
sources of diversity. In online comments, Nora 
Sabelli of the National Science Foundation and 
Sha Xin Wei of Stanford University noted that 
the differences between disciplines might run 
deeper, reflecting the nature of argument 
(descriptive, logico-deductive, dialectic) in dif- 
ferent fields. They suggested that the human- 
ities might contribute to other fields such as 
medicine and vice versa, based on diversity 
among these fundamental dimensions. Sha Xin 
Wei noted that mathematics was part of the 
classical humanities curriculum, and that “it 
consists of intuitions about, and elaboration 
upon, structures more akin to literature and art 
than to the empirical sciences.” 

Such commonalities in intellectual processes 
should be the link to software functionality, 
leading to software support for broadly defined 
styles of reasoning and argumentation, rather 
than discipline-specific methods. 
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B. Complexity of Representation 
Running through much of the discussion was 
the contention that the requirements for knowl- 
edge representation in the humanities are excep- 
tionally complex. To some extent this opinion 
reflects the views of specialists for whom off-the- 
shelf software is inadequate; it may therefore not 
be specifically about a fundamental characteristic 
of the humanities, but instead reflect the relative 
poverty and limited technical investment in 
humanities computing, which often requires 
humanists to use tools not specifically created 
for them. Examples cited, such as multilinguali- 
ty and methods for treatment of missing data 
(which is the norm in much humanistic 
research), are issues in day-to day computing but 
may not be requirements for a “research” agen- 
da. For example, Janet Murray of MIT, in her 
comments on the paper by Kolker and 
Shneiderman, identified two cases in which her 
work required development of specialized tools 
to retrieve text from foreign-language video sub- 
titles and support multiple links from any 
anchor point in an application of a video server. 
Unfortunately, it is far from clear that these 
issues of unique software design requirements 
can be addressed collectively; humanists and 
their funders may simply have to acknowledge 
that more funding needs to be directed toward 
appropriate software for specific tasks at hand. 

Susan Hockey referenced a more fundamental 
aspect of complexity of representation in the 
humanities, noting the prevalence of a multi- 
plicity of intellectual perspectives which the 
humanist wants to keep in the picture at all 
times, since much of the humanities is about 
styles of discourse and diversity of conceptual 
frameworks. The requirement to see a textual 
source simultaneously through a variety of 
interpretive lenses and to bring them together 
at various points differs fundamentally from the 
requirement to see a material object through a 
variety of optical lenses or wavelengths of light; 
what humanists mean here, and how comput- 
ing tools might assist them, deserves further 
research. The same observation is clearly true of 
images, although research in this area is much 
less developed. 

(L Variability of Expression 
An interesting and important observation made 
by Gary Marchionini in the context of search 
and discovery was that the humanities actually 
encourage differences in ways of expressing 
ideas for the sake of interesting prose. Not only 
docs this fact defeat many efforts to standardize 
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ter.ninology or provide algorithmic methods of 
analysis; it poses interesting challenges to intelli- 
gent fuli-text information retrieval. As the con- 
cept of “variations on a theme by. . .” makes 
clear, the concept of derivation as new creation 
is fundamental to the arts and humanities. 
Variations become increasingly less derivative 
and can elaborate ideas far from the original 
theme, which creates fundamental challenges 
for developing tools that explore degrees of dif- 
ferences, especially when the original expression 
is in images and sounds. One of the complicat- 
ing factors here is that new humanistic works 
incorporate and elaborate on originals; in the 
digital environment especially, the act of cre- 
ativity itself can be blurred. 

D. Historical Orientation 
Artists and humanists are not alone in dealing 
with time as a variable in their research, as dis- 
cussant Warren Sanderson of Concordia 
University observed, but they are more likely 
than others to want access to older sources and 
to need to understand them as they were origi- 
nally understood. The implications of this ori- 
entation for arts and humanities computing 
research include the following: 

♦ Techniques for acquiring digital represen- 
tations of traditional source materials will 
continue to be important in the medium 
term because huge quantities of original 
materials nee<l to be retrospectively digitized 
to achieve a < ritical mass. 

♦ Serious research is needed into the fungibili- 
ty of original sources and into their reusabil- 
ity before great efforts are expended in 
capturing the material. If certain types of 
sources arc in fact highly fungible, substan- 
tial effort could be saved in digitization. If 
many sources are not reusable, or reusability 
of sources depends on highly specific tech- 
nical and intellectual characteristics, wasted 
investments can be avoided. 

♦ Humanists need to develop and employ col- 
lective methods for defining representation 
conventions used in treating source materi- 
als, and to incorporate into sources such lay- 
ered knowledge as commentaries, 
pathfinders, and attribution tools that both 
represent a point of view and reflect the 
understanding of others, from different his- 
torical periods, concerning the same objects. 



♦ Humanists will be dependent on research 
that preserves digital signals over long peri- 
ods of time (as reflected in points made by 
Peter Graham of Rutgers University) a. id 
the meaning of digital representatic. .» over 
time (as stressed by David Beannan ot 
Archives Museum Informatics). 

E. Textual Bias 

One of the subtexts of all the discussions was 
that much of humanities scholarship, outside 
the arts, has been strongly oriented toward text. 
Because the authors of the research agenda 
papers were specifically asked to think about 
non-textual information, they found many 
opportunities for additional research presented 
by image, sound, and multimedia. Contnuators 
to the debate clearly expected that “technology” 
would solve the problems associated with image 
standards and with integration of multimedia. 

In spite of disagreement about whether digital 
cameras had already achieved resolutions ade- 
quate for capturing primary materials, as reflect- 
ed in exchanges between Kevin Kiernan of the 
British Library and Anne Kenney of Cornell 
University, participants expressed no doubt that 
these pesky technical issues were going to be 
resolved shortly and without input from the 
humanities. Therefore, most of the discussion 
of research implications was focused on the 
concept of quality as it applies to any represen- 
tation made for any purpose. 

Contributors clearly felt quite comfortable with 
community-defined standards for knowledge 
representation, such as the choice of SGML 
markup and the Document Type Definitions of 
the Text Encoding Initiative (TEI) for text, but 
the call for further research began in earnest 
with markup of image or sound data. As usual, 
the question of how best to represent the knowl- 
edge embedded in such multimedia objects 
turned on the purpose of representations, the 
nature of the intended audience, and the mean- 
ing of a precision of reference and preserv'ation 
of context (to use Janet Murray’s criteria of 
quality) when applied to dif^:fent modalities 
and different humanities uisciplines. It was evi- 
dent that these questions have not been satis- 
factorily answered and that substantial research 
will be required to begin to identify features for 
integrated multimedia markup and to assess the 
benefits to artists and humanists of such value- 
added efforts. 
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F. Cumulative Character 

The arts and humanities are being developed, 
taught, and thought about on an ongoing basis. 
For many participants, the design of future 
teaching and learning was a critical topic of 
research for humanists. Bob Rosenberg of 
Rutgers University and Bob Arellano of Brown 
University urged further examination of the 
impact of digital delivery systems on learning 
and organizing teaching resources. Jerrold 
Maddox of Penn State University expressed 
concerns that his teaching had by necessity 
become more exam-based since his students 
were often 1,000 miles away, and proposed 
detailed study of the good and bad conse- 
quences of distance education. Janet Murray 
provided examples of how new intellectual par- 
adigms may resonate with the new technologies, 
as in the teaching of writing. There was no sim- 
ilar discussion of the teaching of art, although 
oblique reference was made to teaching drama 
using digital sources of previous performances 
of the same plays. 

What seems most interesting about the discus- 
sions of learning and teaching is the role of 
cumulative knowledge and the representation of 
cumula»-ive knowledge. Current computing 
tools p '^vide the best environment we have yet 
mad' lor exploring such overlays as are created 
by commentary built up over time. Research 
iiv.o the benefits of using such methods for 
learning will go a long way toward validating, 
or discrediting, their use in teaching. 

Research Agenda Issue: 

The arts and humanities have not ^ven rise 
to a field of self study analogous to the hh- 
toryy philosophy, and sociology of science, 
long since designated a scholarly discipline 
in many universities. As a consequence, a 
lack of agreement on the fimdamental char^ 
acteristics of the fields constituting the arts 
and humanities precludes the conditions for 
successful systems development and evolu- 
tion. A research agenda that does not 
address how the arts and humanities can 
become the object of systemic study will hai^e 
little long-term impact on the state of tools, 
methodologies, and analytic frameworks for 
support of these fields. 
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III. CHALLENGES ACROSS RESEARCH 
DOMAINS 

Several proposed research challenges, while not 
attributed to the nature of arts and humanities 
per se, nonetheless applied across disciplines 
within arts and humanities. These research 
problems appear to be relevant to any body of 
organized knowledge elaborated upon by a 
community of practitioners. 

A. Disciplines as Symbolic Systems 
Disciplines, including those in the arts and 
humanities, are formal systems, with languages, 
representation conventions, and ways of think- 
ing. Moreover, different disciplines evolve dif- 
ferent ways of thinking about resources. If we 
are to develop adequate means for computing 
to serve “the arts and humanities,” understand- 
ing the differences between these formal sys- 
tems is crucial to model our representations of 
sources correctly. And if we are to decode their 
representation conventions accurately at a 
future time, documenting the representation 
rules we subsequently use will be essential. 

Little research has been conducted into the gen- 
res of expression used by humanities disciplines 
and the constantly evolving assumptions under- 
lying them. The claims that humanities disci- 
plines share the need to represent the processes 
and contexts of creation, and that precision of 
reference and preservation of context play a spe- 
cial role across disciplines, have as yet little sub- 
stantiation within the research literature. 

The design of the rules for SGML encoding 
adopted by the TEI, for example, anticipate the 
ongoing analysis and markup of digitally cap- 
tured sources. The resulting many-layered repre- 
sentation, carrying perspectives of a number of 
disciplines and the attributions of many ana- 
lysts, will make genre analysis a major research 
issue for humanists. Defining the factors critical 
to understanding sources specific to different 
disciplines should inform future guidelines for 
text representation. 

B. Multimedia Representation 
To carry modalities of information other than 
text will require methods for linking one piece 
of information to another, including objects of 
different modalities, in ways that reflect the 
original (pre-digital) intention. Different kinds 
of objects have different functionality with 
respect to their links: for example, spoken 
objects need to be heard, three-dimensional 
objects need to be moved through and around, 
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and objects that magnify parts of other objects 
need to be ‘'opened when clicked.* At a much 
more fundamental leveh in order to represent 
multimedia data the way that end users perceive 
it, humanists need to co iduct research into the 
meanings of the various modalities of informa- 
tion and hew meaning is affected when they are 
combined. A variety of types of information 
cannot yet be used effectively because we lack 
ways of representing it digitally that will be 
available for use by others. To illustrate this 
point, Susan Hockey identified the problem of 
representing derived knowledge, while Anne 
Kenney pointed to pattern matching, object 
recognition, or raster-to-vector conversion. 

Thus, subjects for humanities research that 
would contribute to the evolution of new forms 
of digital communication include discourse on 
the construction of “intelligent files” that reflect 
modes of speaking, have “hot” links, execute 
scripts, and contain other dynamic and 
authored elements. 

Vhile practical difficulties in managing the 
evolving new genres such as the corpora and 
rich webs being created in some disciplines and 
specialties are not unique to the humanities, 
humanists have a special role to play in docu- 
menting and researching the implications of 
these new approaches for scholarship and teach- 
ing. Several disciplines in the arts and humani- 
ties will soon attain the stage at which large 
enough bodies of digital content exist to consti- 
tute the “critical mass” long thought essential 
for any serious research into the impact of mul- 
timedia. Any research agenda needs to join 
these fields of scholarship in virtual multi- 
disciplinary laboratories. 

C. The Need for Standards 
Standards, or the lack of them, were a major 
concern of most of the authors and are, of 
course, essential to effective communications. 
But what was meant by standards, and whether 
human ities-based research would contribute 
specially to such standards, was not always clear. 
Kolker and Shneiderman invoked the need for 
interface standards and methods of accessing 
content; their focus on these was supported by 
commentators who felt that the humanities had 
special needs for (Graphical User Interface 
(GUI) standards beyond those being met today. 
They were strongly seconded by Nancy ide 
(President, Association of Computers and the 
Humanities), who viewed the success of elec- 
tronic means of research and teaching as 
inevitable but saw the development and pro- 



mulgation of appropriate user interface stan- 
dards as a sine qua non of that success. In partic- 
ular, reference was made to tools that would 
support annotation and attribution, comparison 
and presentation, and synthesis. Warren 
Sanderson of Concordia University envisioned 
he framework as living between sustained nar- 
rative and a database, allowing for drafting, dis- 
semination, amplification and modification, and 
commentary'. “It approaches,” he said, “the char- 
acter of a continuing seminar or colloquium.” 

Sha Xin Wei cautioned, however, that standard- 
ized environment elements, such as the World 
Wide Web protocol, are not really tools but sim- 
ply infrastructure and that toolsets will be con- 
structed around scholarly tasks and disciplines. 

Susan Hockey explored the role of meta-data as 
independent representations of the logical and 
physical source, w'hich led to the importance of 
SGML for preventing obsolescence in text rep- 
resentation. She noted humanists’ need for mul- 
tiple parallel hierarchies in SGML (which 
remains a research problem) and the limitations 
of HTML in this respect. It is not evident that 
new standards are required for representing sig- 
nificant intellectual features of texts or multi- 
media, specific to the humanities; agreement on 
what meta-data ought to be employed for these 
purposes calls for further research. 

Janet Murray foresaw that teaching from texts 
will be severely hampered unless we can develop 
standards for text management software, but 
these are only the tip of a larger iceberg: appli- 
cation interoperability standards of value to 
education. Ron Overman of the National 
Science Foundation added that ethnographic 
databases, geographical databases, economic his- 
tory databases, and databased video all rep- 
resent environments needing common 
authoring and retrieval tools and standard 
methods to enhance intra- and interdisciplinary 
research. Because there is little reason to believe 
that interoperability standards are more neces- 
sary in the humanities than in other areas of 
endeavor, however, a research focus specific to 
the arts and humanities seems unnecessary. 

In some areas, the arts and humanities could be 
special beneficiaries, (airrent standards for digi- 
tization of images arc confined to technical 
standards necessary to record pixels, rather than 
intellectual standards for recording the content 
and ideas the images represent. While technical 
standards help ensure quality of capture, Anne 
Kenney makes it clear that the humanities will 
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always need to ask “quality for what purpose”; 
content-level standards, based on intended use, 
will require further research into those uses. Of 
course, if the images are in color, humanists will 
be concerned that the surrogate has the same 
color as the original, a nearly impossible goal 
without standards for color management and 
display, which are stii*. in their infancy. 



System and architecture standards were not for- 
gotten. John Garrett noted the crucial need for 
reliable, standard infrastructures. Several such 
standards would be of special importance to the 
humanities, including location-independent 
naming of objects and registration methods for 
digital objects that will protect intellectual 
property and ensure credit. David Bearman 
called for immediate investment in standards 
for meta-data encapsulation of records to pro- 
tect their qualities as evidence, to fulMll an 
essential aspect of trustworthy and reliable testi- 
mony critically important to all scholarship. 



Rhsfarch Ageni^a Issues: 

If the arts and humanities are to be success- 
fi/1 in influencing the development of soft- 
varey display and telecommunication; 
technobgies, and standards ranging from 
applicatiofis to systems, they will require 
supported spokespersons capable of taking 
their position in the vast array of standards- 
setting and de facto standardization pro- 
cesses wider way in the computing industry. 
Substantial co^ts are entailed to retain the 
technical expertise to play effectively in the 
standards arena. Further investments will 
be required to maintain regular contact 
with arts and humanities scholars and cred- 
ibly represent their interests, A research 
agenda that overlooks the need to support 
such infrastructure will have little impact 
on the fundamental characteristics of com- 
puting and communications technologies. 



IV, INSTITUTIONAL CHALLENGES 
The task of writing about new societal mecha- 
nisms was assigned to John Garrett. A broadly 
based response urged further study of emerging 
institutions and imagined institutional arrangc- 
mcnis, with experimentation the frequently rec- 
ommended means of exploring new 
institutional structures. Virtually every partici- 
pant highlighted the need to understand and 
better manage the social dimensions, organiza- 
tional challenges, and economic constructs that 
the advent of digital networked communica- 
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tions had brought to the humanities. The 
called-for research ranged from providing sup- 
port and tools for humanistic scholars and 
developing more cost-effective methods of data 
capture, conversion, delivery, and distribution 
to more fundamental issues of promotion and 
tenure, peer review, access to resources, and 
support of “trailblazers.” While the need for 
research in these areas is not confined to the 
humanities, humanists keenly feel the absence 
of a framework for entering the digital age. 

A. Distribution of Scholarly Knowledge 
Communication is, above all else, essential to 
the arts and humanities. The system of dissemi- 
nation that supports them, in its broadest defin- 
ition, encompasses all means of publishing and 
performing. The significant changes that this 
system is undergoing raise many questions 
about its direction and method of getting there. 

Scholarship requires repositories of knowledge 
and communities of debate. Building libraries is 
the first task, and it is evident that we do not 
know technically how to go about the capture 
of digital information, where to get the funds, 
or where to begin. Katherine Jones-Garmil of 
Harvard University identified the serious need 
to move beyond the “greatest hits,” or works of 
canonical importance in a given discipline, to 
the primary sources of real value to scholarship. 
She and others called both for evolution of the 
electronic journal and for research into the ben- 
efits and drawbacks of electronic-only dissemi- 
nation of current knowledge. 

Accessing the resources, if and when they are 
digitized, is no easier. Toni Petersen of the Art 
& Architecture Thesaurus noted that “incred- 
ible funding resources are going to have to be 
applied •'O improve” discovery and retrieval. 
Research by the Coalition for Networked 
Information over the past year has suggested the 
same. Even when electronic representations 
have been found, getting them to those who 
need them is no trivial matter. Kolkcr and 
Shneiderman joined Janet Muiray in calling for 
research on how best to deliver electronic 
resources to students. The Museum Educational 
Site Licensing Project in the United States, 
which has drawn attention to this problem, is 
among the experimental fields in which research 
on these questions can be pursued. 

Once data is delivered, interpreting what has 
been sent and providing tools for understanding 
it presents no small task. Anne Kenney and 
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Janet Murray pointed to the large compendia 
and discipline-based projects that are creating a 
new resource, rather than simply a library of old 
sources, and to the social implications of creat- 
ing “course length” hypermedia. How, they and 
others asked, will the role of the scholar, as 
teacher, as author, as reader, or as curriculum 
developer change? 

Challenging part of the framework suggested by 
John Garrett, Paul Peters of the Coalition for 
Networked Information pointed to numerous 
studies, and to the need for many more, exam- 
ining how traditional roles in the production 
and dissemination of scholarship are breaking 
down and what is replacing them. The systems 
being studied are essentially those of the tradi- 
tional scholarly publishing chain, but other 
ecologies need analysis, too: the authors of fic- 
tion, poetry, music, dance, theater, film, and 
software are part of different dissemination 
chains that arc no less affected by change, per- 
haps even more so. 

B. Education 

Despite the great promise of electronically net- 
worked •»'esources, higher education has yet to 
capitalize on them as supports for its research, 
teaching, or service roles. The concerns of ele- 
mentary and secondary education were nearly 
invisible in the online discussion, but sure'y they 
will have as great an impact as the universities 
on the electronically resourced future of learn- 
ing. In any case, a research agenda that docs not 
look equally seriously at the implications of arts 
and humanities computing for K-12 education, 
and for lifelong learning, as it docs at higher 
education will fail in the most important 
respect: it will lack relevance to the social con- 
text in which the case for arts and humanities 
computing must ultimately be nude. 

But this aspect of the research agenda is formi- 
dable. To begin with, we know very little about 
the use and impact of digital surrogates in 
learning. It may be too early to study the efTeets 
of new media, and we may still know too little 
about learning itself. But it is not too early to 
formulate questions and to begin to gather 
baseline data from which to assess the inroads 
made by new methods of teaching and learning 
based on eleetroiiie resources and software- 
assisted methods: Small- scale, controlled stud- 
ies, with substantial qualitative aspects, could 
first serve as the basis for larger, quantitative 
studies that make comparative assessments. 



C. I .aw 

Changes in society lead to changes in law. In 
the case of electronic resources in arts and 
humanities, these changes are still too inchoate 
to provide adequate support for potential devel- 
opments such as the copyright of electronic 
resources in education and the reliance on elec- 
tronic evidence for historical study. Janet 
Murray, and Jennifer Trant of AHIP s Imaging 
Initiative, expressed the contemporary uncer- 
taint)^ regarding intelle.cual propert)' law. 
Specifically, these uncertainties are seen as hav- 
ing current negative impact on media studies, 
but the longer-term impacts will be on all uses 
of historical resources that need to be converted 
to electronic form. David Bearman pointed to 
legal uncertainties about what it means to pre- 
serve electronic evidence and how failure on the 
part of governments and individuals to create 
authoritative electronic records will impede 
future historical research. 

Research, combined with advocacy, can ad- 
vance arts and humanities interests within legal 
framew'orks. Research that defines specific 
harms and identifies equally specific remedies is 
essential to future electronic scholarship. The 
pace of legislation is generally faster than that of 
research. I'hus the challenge is to fund anticipa- 
tory research by policy research groups already 
m place. 

n. Economics 

During the conference, then- was only indirect 
discussion of the impoi tance of economic 
research to the agenda of humanities comput- 
ing. Yet humanists often feel that the agenda for 
software research, for example, is being set by 
commercial firms with needs and priorities dif- 
ferent from theirs, and that the nature of the 
medium and its use is being determined by 
i n fo- tain men t rather than by educational inter- 
ests. Although considerable research has been 
conducted on the economics of the current, 
paper-ha.sed information delivery models in 
libraries, the discussion neither referenced this 
work nor called for more. Nevertheless, only a 
better understanding of the economics of the 
svstems that support arts and humanities will 
change both those systems and the flows of 
resources through them, to achieve desired new 
ends. Any serious research agenda for arts ,ind 
humanities computing will support research on 
the ecoi jomics of capture, storage, retrieval, 
delivery, and use of electronic resources, as well 
as examine the costs of failure to develop an 
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appropriate mechanism for arts and humanities 
to exploit computing capabilities. 

E. Communication 1'eclmology 
The attention given by the authors to issues of 
communication, collaboration, and dissemina- 
tion highlighted the transition over the last 
decade from freestanding to networked com- 
puting. Virtually all the authors, while celebrat- 
ing the virtues of the Internet, bemoaned its 
primitive organization of resources and access 
methods. Research into both automatic and 
human-assisted finding tools for making 
resources known was seen indispiitabh' as yield- 
ing th^gfeatest benefit. Its value would increase 
in proportion to the continued growth of 
resources and might exceed the benefits of sim- 
ply adding new materials. 

While authors and discussants pointed to exem- 
plary Internet sites, they acknowledged the 
severe limitations of common knowledge- 
repicsentation toolsets such as those based on 
H I'ML. rhe advantages of mixed media in the 
digital nenvork nevertheless raise a host of 
research problems, ranging from such basic 
technical issues as linking objects of different 
modalities and determining appropriate levels 
of compression research to more fundamental 
Jemands for greater understanding of user 
needs and perceptions. The sense that digital 
multimedia is the beginning of a new means of 
human communication has yet to give birth to 
a research framework in which the meaning of 
this revolution, and the means for promoting it, 
can be understood. 

'rhe concern for rhe instability of the current 
network was accompanied by a certain despair 
over how the arts and humanities could influ- 
ence it to become more the kind of long-term, 
supportive communications environment they 
need, Specifically, dramatic improvements in 
display <^cchnologies and interoperability stan- 
dards need to be developed and sustained to 
overcome the current impermanence of the vir- 
tual networked library. Of critical importance Is 
re.search to identif}' methods to prevent destruc- 
tion of the last or archival copy of a work as 
well as means to ensure that archiving solutions 
in a networkcs.l environment will prove both 
.scaleable and susceptible to implementation. 

finally, the participants saw a need for new 
tools. In the face of their inability to digest the 
ihou.sands ol new tools being thrust out into 
the market annually, there was nevertheless n 



sense that some classes of tools were not fully 
understood, would not be made by the com- 
mercial sector, or would not be effectively used 
by arts and humanities scholars without sub- 
stantial new support. In addition to better 
methods of search and discovery, rhe leading 
requirement was for stronger mechanisms to 
support editorial or critical review and the ana- 
lytic and annotation facilities they required. The 
widespread call for tools that could evaluate, 
automatically summarize, and ii tegrate differ- 
ent sources raised the implicit cuestion of how 
the humanists role will change when software 
performs these traditional intellc’Ctual tasks for 
the scholar, 

d he absence of baseline dara about what com- 
munications and computing facilities the arts 
and humanities are using, and for what pur- 
poses, makes it difficult to identify where best 
to invest in research. The first research issue, 
therefore, will be to establish baselines. 

F. People 

In the midst of large-scale social change, 
understanding what i^ happening to people 
and their inieractions with technology is criti- 
cal to making it work better. Vhis requires not 
a onc-rime study, but rather an ongoing effort 
of many different disciplines over the foresee- 
able fu-rure. What kinds of questions will have 
to be asked, again and again, to navigate 
through tins transition? What skills are needed, 
what meanings are to be imparted, wliat meth- 
ods are to be employed? 

Koike r and Shneiderman called ior ongoing 
research into the shifting computer-literacy 
needs of faculty and students. One could rea- 
sonably extend this call to the genera! public 
and to younger .students as well. Probably of 
equal importance to the humanities, as Anne 
Kenney and Donna Romer pointed out in their 
discussion of image representation issues, is 
understanding the meanings that new informa- 
tional genres will have for their ‘Veaders” (even 
the concept of ‘Veader” will have to give way to 
a viewer/part ici pa nt/contribu tor), how represen- 
tations will function as surrogates, and how 
the y will serve purposes beyond surrogacy. We 
will need tc. continue to explore the cultural 
and discursive implications of nonlinearity and 
multiple intellectual perspectives on a single 
text, issues raised by Susan Mockey, Wliat will 
the impact of availability be on tlie perceived 
usability ofitnages by the end user, a.s di.scu.s.sed 
by Anne Kenney? 
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Skills and meaning will merge in dccermining 
what tools future reseaichers will need and how 
they will use them. Ongoing research into t’le 
demand for structured-vocabulary searching, full- 
text searching, and searching through knowledge 
bases using intelligent agents will help chang: 
methods of representing knowledge in digital 
collections. Ongoing research into image analysis 
and description, indexing, and annotation, and 
the use of machine intelligence to locate images 
through pattern matching and object recogni- 
tion, as called for by Donna Romer, can have 
equivalent implications. The ultimate need is for 
a research basis to determine not only the ef+ect 
of future intelligent objects on scholarship but 
what kinds of intelligence they, and the systems 
that support them, ought to have in order to 
contribute to scholarship. 

If research could bring about Gar)' Marchioninis 
vision of search and discovery tools integrated 
with creation, use, and communication tools, 
how .vould that vision change his identified 
need for electronic analogs of existing genres of 
finding tools? If research establishes that the arts 
and humanities address an imprecise audience 
with many varied intellectual perspeciives, as 
numerous compientators .suggested, what 
requirements will this place on sofrsvaro ro pro- 
vide multiple approache.s, layered representa- 
tions, and wcll te.sted interface methods? If, as 
Donna Romer asks, we can find ways to mean- 
ingfully identify content attributes within 
images for automatic identification by comput- 
ers, we will still need to understand visual 
thinking proces.ses (which, in turn, will evolve 
rapidly). How much more so the representation 
of motion and music, in which the state of the 
art today is .so primitive? 

We can readily agree with Janet Murray that 
hypermedia authoring and reference environ- 
ments arc urgently needed, yet have no idea of 
the impact of the.se tools on the humanities 
and the arts. 'Fhc leitmotif here, as John 
(larretl reminds us, is that there is a strong 
interplay among technology, .scholarship, and 
.society and that we have yet to begin the job of 
.studyir g the.se variables to tune the .system. 
What far-reaching consequences would collab- 
oration tools with mechanisms for assigning 
responsibility and credit have? How will low- 
ered entry barriers for .scholarly publishing 
affect the humanities? 



Finally, Bearman reminds us that the entire 
concept of evidence has its roots in the culture 
and that the digital object and digital commu- 
nications will transform both our concepts of 
evidence and the literary warrant for records. 

How records are used, an area that has long 
been under-studied, will continue to cry out for 
attention; in a time of changing methods and 
problems, the an.swers will be needed more than 
ever. Katherine jones-Ciarmil of Haivard 
University adds that the electronic )oiirnal and 
electronic dissemination of research upsets exist- 
ing paradigms of authenticity and authorin', 

G. Sources 

It is, of course, equally important to understand 
what is happening to the genres of .symbolic 
expression themselves. Virtually every author 
stressed the significance of research into elec- 
tronic genres and our understanding them as 
means of expre.ssion. Kolker and Shneiderman 
raised the question indirectly in reviewing 
exemplary Internet sites: what makes a '‘home 
page” valuable, effective, or even interesting? 
Susan Hockey asked more explicitly for re.search 
into ways of creating a new genre that she 
believes is essential for scholarship in the 
humanities: one in which representations of 
structure and content are independent, multiple 
perspectives and versions can be interrelated, 
and nonlinearity can be supported. Aiine 
Kenney asks us to understand not only what 
different genres are, but also what are their 
functional requirements for digital repre.senta- 
tions to enable us to devise automatic capture 
settings and make decisions about conversion 
priorities with automated .selection and control. 

Michael Joyce of Va.s.sar CA)llege contributed 
numerous examples of collaborative work in 
MOO (Multi-user Dungeon, Object Oriented) 
space and of collaborative approaches growing 
out of the “C'omputers and C^omposition” 
movement that have spawned .sofnvare, jour- 
nals, conferences, and even new di.sciplinary 
as.sociations. In his view the radically new 
means of expression interact with the complexi- 
ty of the “feminist, post-modernist and other 
radical” content of the expre.ssion they have 
engendered. Donna Romer calls on us to con- 
duct re.search into the formal propertiis ol gen- 
res in different modalities and to explore how to 
create and exploit an entirely new genre, the 
“vi.sual thesaurus.” And, ofcour.se, we have the 
genre of nonlinear writing, for which we need 
both better tools and a basis for understanding. 



21 



RESEARCH AGENDA FOR NEEWORKED CUU URAL HERITACU-: 



When John Garrett calls for research on 
resource identification systems he in part 
reflects the need to identify what a unique 
resource actually is in an age in which the “orig- 
inal” and the “copy” are indistinguishable and 
expression involves evolutionary versions, bor- 
rowing, and references to external entities. 

Bear man’s model of records as transactions will 
require research on how best to capture meta- 
data defining the record, creating new genres of 
communicated transactions and new require- 
ments for robust, functional representations. 

Research Agenda Issues: 

Identifying institutional and social changes 
essential for creating a hospitable e^iviron- 
ment for compnter'Supported arts and 
humanities is critical since neither the 
human nor capital resources for changing 
everything are available. Research that 
begins to identify critical success factors and 
locate current barriers tvill help realize the 
potential of arts and humanities computing. 

Cited e-mail contributions to the discussions 
(other than those in the commissioned 
papers). In each case, the names and institu- 
tional affiliation of discussion contributors are 
cited in the text. 

Bob Arki.iano (Brown Unin i rsi i v), '‘Ri-.: 
Lkarnino, anoTi-achinc;," Ociobi-r 10, 1995 

David Bi-arman (Auchin’ks & Mdm.l.m 
Ktormai ics), ‘'Ri-: Arc:hivinc:,” juiY 16, 1995 

MlCHAEl. Bl'ciki.and (Unin'krsi i V or Cai.ii-ornia, 
Bi-UK la.KV), “Know! . line; K RM>Ri.si-.\TAt ion,** 
Novkmbi-:r 28, 1995 

Pei kr Graham (Rrit;i-:us Unin i-usitv I.ihrarii-.s), 
“AunmiNC." Jn.v 10, 1995 

. *‘Rl-.: Rh: AnVANChP Archivinc 

Ti cuNoi.ociKs,” ju.Y 24, 1995 

Nancy Ini-: (Associaiion i-or Complh i.rs and 
mi HcMANmi-:s), "Com.mi nis on Tch^i s i or 
Criatinc and Exim.oiiinc; Conii-ni,*’ 

Novi-.mui.r 15. 1995 

KAI III RINI. jONI-.S-CiAR.Mll (HARVARD UnIVI- RSI I Y), 
[noscbiici I ini |, No\'I MHI R 15, 1995 

MK HAM. jOYCr. (V.-\SS.\R Goi I I Cl ), “Co.MMl N IS 
ON l.l ARNI\(, ANn l l -\( I1IN(, PaM R," 

No\imbi-R 14, 1995 

Anni Ki nni-y (Corni 1 1 Uni\ i rsi iy), * Ri : Tool s, 
RllMUSlNIAllON, ImaCI” )C1Y 19, 1995 

KlAlN Kll RNAN (UN'VI RSI I Y 01 KiMCCKY), 

"CONX I RSION.” JCNI 25, 1995 



. "Tools, Rr.mu-si-NiATioN, Imaci.," 

June 27, 1995 

Ji-:rroi.i) Maddox (Pi-.nnsyi.vania Stati-. 
Univi-rsity), "Learning .and Tew.hing,” 

Oc:roBKR 6, 1995 

J.ani;t Mdriuy (MIT), "Lm,A(;inin(; Idi:ai 
En\'Iron.n9-n is,” Junk 26, 1995 

. "R;-.: 'Poors, Rki‘ri-..si-:n iai k^n, I.ma(;i--," 

Jcnk28, i995 

. ‘’Re: AinANcr.D Arcimvinx; 'ri-:c:HNoi.(Hai-:s," 

Jci.y2!, 1995 

. “FUNCrriONAl.lTlES l-OR Hi’.MANITIKS 

St:i-iCM.ARS,*’ jn.Y 21, 1995 

Ron 0\'i-.RM.AN (NSF), “Ri-:: 'Pocils i-or 
Cre-vi ing and Expi oitinc; Con i knt Paper, ’* 
Novi-;mhi:r 6, 1995 

Pali Ph tkrs (CNI), “John G.ARRi-: ri \s ‘Ni-.w 
Soc:iAi. AND Econo. M ic: Mi-.c:hanis.ms’ Papi-.r,** 
October II, 1995 

'PoNi Pi-:ti-:rskn (Gkity AHIP, AAT), “Ri-.: 
Rhsolrci-: Si-:arci i and Disco\’i-;ry Paper,’* 

Oc. roHi-.R 26, 1995 

Bob Rosi-:nbi-:rc (Rutgers UNm-.Rsi i y), 

“Ri-:: L|'..arninc; and Teaching." October 9, 1995 

Nora Sabei.i.i (NSF), “Ri.: Hockey Paper," 

Oc. roBi R 26, 1995 

Ji-:rry Sai;i /i;r (MIT), “Re: Adx'anced Arctiivinx; 
Pec : HNoi. oca i-:s," July 12, 1995 

Wapri-.n Sanderson (Concordia Unixtrsi ey), 
"Resolrct; Smxrch and Discox ery," 

October 1 1, 1995 

JT.NNIE1-R Trant (Getty AHIP). “Re: Tooi s. 
Representation, iMAca-:," Jci.y 19, 1995 

Leonard Wii.i. (CoNsia.TAN T), (no sLuvn-G T i int], 
November 28, 1995 

SHA XiN Wl-t (S'TANl-ORD UnIXT.RsTTY), 

"'Pool S TOR CrE.ATING AND EXPI Oi l INC. C A)N TTNT," 
Oc:tobi-r 24, 1995 
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Synopsis of Research 
Opportunities 
AND Funding Needs 



INTELLECTUAL ISSUES 

^ Shared methods of representation serving different perspectives and functionalities in order to create a 
unified and comprehensive library of useful knowledge 

Knowledge representation 
lOff Effective representation of knowledge 

10 Different degrees of “fidelity” in knowledge representation 
10, 17 Meta-data elements required for humanistic research 

10 Applicatioi. of semiotics research to knowledge representations as material cultural objects 
10‘, 13 Functionalities that humanists require of digital representations 
i2 Preserving context and meaning in higher-level representations of knowledge 

17, 19 The meanings of new information genres, their function as surrogates, and how meaning 
is affected when genres are combined 

Conversion, i rlatmen t, and dc cumen tation of sources 

10 Features of intellectual sources that help interpret, understand, and connect them 

14 Simultaneous multiple interpretations of resources 

15 Tools to distinguish degrees of difference between original sources and their various 
derivation 

15f Consensus on methods of defining and documenting representation conventions for 
source materials 

Multimedia 

10, 18 Standards for representing content of non-textual media 
15 Identifying features for marking up integrated multimedia 

♦ Discovery and retrieval in a distributed network environment 

11,17 Methods to manage, and provide access to, large collections of digital materials 

1 1 How retrieval resembles, and differs from, discovery 

1 1 The relevance to the nerworked environment of prior research on centrali'/.ed databases 
Toods 

1 1 Discovery tools that better exploit browsing capabilities and prior knowledge 

17 Retrieval tools that support annotation and attribution, comparison and presentation, 
and synthesis 

17 C'.ommon authoring and retrieval tools to enhance intra- and interdisciplinary research 
20 Automatic and human-assisted discovery tools 
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\rKXT .AND EDi riNC 

1 1 Better tools and techniques for full-text retrieval, including pre-processing, linguistic 
analysis, and artificial intelligence 

20 Mechanisms and analytic/annotation facilities to support editorial or critical review 
Multimedia 

1 1 Retrieval models applicable to multimedia, including optical pattern matching, 
approximate retrieval, and automated indexing 

12 The effectiveness of text-based retrieval in image-based resources 
12 Criteria for representing image sets, rather than individual images 

16 Methods for linking multimedia information objects, to reflect their pre-digital intention 

21 Structured-vocabulary searching, image analysis and description, indexing, annotation, 
and machine intelligence for retrieval 

Standards 

1 1 Data standards for uniquely identifying nenvorked objects, to ensure meaningful 
discovery at a consistent level of detail 

9 Interface standards 

♦ Persistence of computerized resources over time to ensure future stability of knowledge 



Ec. GNOMIC FACTORS 

1 1 Cost-effective methods for digital conversion of resources 

1 1 Demonstrations of scaleable technologies and organization for digital conversion 

15 Retrospective digitization of large quantities of original materials, to achieve a 
“critical mass” 

19f The economics of capture, storage, retrieval, delivery, and use of electronic resources; 
costs of failure to exploit computing capabilities 

20 Methods to ensure that nenvorked archiving solutions are scaleable and implementahle 
QiALlTV 

11 Criteria for evaluating converted resources, to foster further and improved capture and 
delivery 

1 1 Quality benchmarks for conversion 

12 Image quality characteristics and methods of documenting technical characteristics of 
digital images 

18 Standards for meta-data encapsulation of records, to protect their qualities as evidence 
Ml moDS 

1 1 Compression techniques 

13 Standardized approaches, implementation, and training in methiKls ot archiving 
digital records 

1 PresetA'ation of digital signals and their meaning 

lb Standards for color management and displ iv 

18 Location-independent naming of objects; legistration methods for digital objects that 
protect intellectual property and ensure credit 
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18 Techniques for creation of digital libraries 

20 Display technologies and interoperability standards, to overcome current impermanence 
of virtual libraries 

20 Methods to prevent destruction of archival copies 
INFRASTRUCTURAL ISSUES 

♦ Publication, conference, or electronic discussion list through which to report and gauge progress on the 
research agenda 

8, 13 Pui-)lication such as Annual Review of Arts and Humanities Computing to report and 

assess papers on the research agenda 

9 Informing practitioners of research results in specialized disciplinary applications 

12 How best to inform educatcjis of the state of research, tools, and digital resources 

♦ Consensus among humanists on fundamental characteristics of their fields, and on criteria for developing 
software and systems 

UnDERS'IANDING 

9 Knowledge of how network tools are influencing the arts and humanities 

10 Investigation of tlie meanings of digital genres, their audiences, and titeir purposes 

12 Documentation of resource sets and their user com- mnities, to establ.sh the characreris- 
tics of varied “points of view” 

13 f. Understanding humanists' working p actices, as a basis for defining fbnction;il require- 

ments that support tlieir activity 

16 Genres oi expression used by humanities disciplines and the evolving asstimp'.ions that 
underlie them 

20 Baseline data about use of communication and computing facilities and for what purposes 

Impact on TRAornoNAT msciPLiNrs 

1 4 Software for specific humanistic disciplines 

16 Understanding of sources specific to different disciplines 

17 Linking disciplinary “critical masses " into virtual multidisciplinary laboratories 

Usi. 

9, 17 Computer interfaces and interface standards 

12 Collaborative software development to create customized authoring environments for 
humanists 

18 Content-level standards based on intended use 

♦ Support for advocacy on behalf of the humanities in technical and standards development 

17 Documentation of implications of new technologies for teaching and scholarship 

17 Humanists' needs for standards and techniques beyond those already available 

17 Need for humanists to define interoperability statidards for their disciplines 
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♦ Promoting computcr-as&isted scholarship and teaching in the arts and humanities 
EDUCA’nONAL IMPAc r 

12 Defining curriculum in light of new technological capabilities 

16 Impnct of digital delivery systems on learning and organization of resources, for design of 
futuie teaching and learning 

16 The consequences of distance education 

16 Examination of benefits of using computing tools in teaching, to validate or discredit 
their use 

1 7 Development of standards for application interopetabilit)' that are of value to education 

1 8 Effective delivery of electronic resources to students 

19 K-12 education and lifelong learning 

TR/MNING and llSh 

9f. Placing computer and network tools in the hands of students and faculty, and training 
/\ them in their use 

10 ^ Upgrading campus-based services 

20 Support for classes of tools that humanists understand poorly or use ineffectively, or that 
would not be produced commercially 

20 Shifting computer-literacy needs of facult}' and students 
Scholarly coMMCNicAnoN 

1 7 Evolution of new forms of digital communication through discourse and “intelligent files' 

18 Evolution of electronic journals; benefits and dr;twbacks of electronic-only dissemination 
of knowledge 

Advocacy and planning 

9 Demonstration projects in discipline-specific applications 

12 Education, implementation strategies, and proselytizing for use of digiuil images 

19 Advancement of arts and humanities interests within legal frameworks; anticipatory 
research by policy research groups 

ADDITIONAL AREAS FOR RESEARCH 
iNSi ri n iONAL and social imiact 

13 Experiments, prototypes, and “re inventions” leading to social and economic change in 
the humanistic academy 

18 Emerging institutions, imagined institutional arrangements, and new institutional structures 

1 8 Management of social dimensions, organizational challenges, an 1 economic constructs 
resulting from networked communications 

19 Breakdown and replacement of traditional roles in production and dissemination of 
scholarship 

20 Human-computer interaction 
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Robert Kolker nnd Ben Shneidcnnan 
University of Mmyland 



STATE OF THE ART 

To be true to the spirit of the humanities, we need to talk about states of the art. I he humanities are 
a large umbrella under which many disciplines carry on many varieties of work, almost all of which 
may be subdivided into smaller components, down to the unique research done by a particular indi- 
vidual. Because humanities research is only occasionally carried on by teams or under the rubric of a 
collective project, computer-based tools and access to content are currently (with a few exceptions) 
distributed across many sites and many individual projects. 

In most institutions of higher learning, the humanities include performance and artistic production 
(theater, music, literature, filmmaking; painting, sculpture, photography); critical and theoretical 
work (art history, litcrar)' theory and criticism, film and media theory and criticism, rhetoric, philos- 
ophy, linguistics); research (history, literature, art history; music history; film and communications), 
and language learning. Frequently these areas intersect. 



Given this diversity and the fact that much of the work of the humanities has been traditionally 
intuitive rather than deductive — and based profoundly on the book — acceptance of technology is 
slow but increasing at a steady rate. On the most fundamental level of equipment, enormous dispari- 
ties exist. Most researchers and professors in the humanities still use low-powered DOS-based or 
Mac computers to do word processing and e-mail. Networking is not universal, though many have 
some kind of Internet hookup. Some are content with this level of access, but may be unaware of 
more sophisticated possibilities and opportunities to improve their work lives. With increased train- 
ing and knowledge of such possibilities, they should be able to raise their interest levels and improve 
their access, which will mean their work will have a greater impact on their intellectual communities, 
their students, and their publics. 

Others in the humanities are actively exploring how technology can advance their research and 
teaching. A few devote most of their research to creating computer-based tools for their disciplines. 
Some team projects are developing common access techniques. For individual research projects, how- 
ever. even the best work is often — perhaps usually — done without careful attention to human inter- 
action factors. 

CURRENT RESEARCH AND ITS PROMISE 

The majority of current humanities research can be divided into three categories; 

♦ The Internet, which can be subdivided into electronic discussion groups and Web sites. 

♦ Existing software, such as graphics, presentation, database and database front-ends, and multi- 
media authoring packages used to develop discipline-specific applications. 

♦ Original software developed for specific or general research projects. 

Network access is among the most important tools for the humanities and perhaps the first many 
faculty use when they step beyond word processing. The wide variety of discussion groups, which 
permit free circulation of ideas, are especially useful in helping colleagues share information, for 
example, the NEH-supported H-Nct — a network of over 57 humanities listscrvs supervised by the 
University of Illinois-Chicago and Michigan State University — provides moderated forums in such 
areas as diverse as women’s history, American studies, ethnic, immigration, film history, rural and 
agricultural studies, and comparative literature and computing. Other humanities electronic discus- 
sion groups have waxed and waned over the years, probably because they weic too general. But most 
H-Net groups seem to thrive because of focus and careful supervision. 
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But these and other network-accessed discus- 
sion groups suffer from the lack of a unified 
nenvork interface and an accessible source of 
information about their very existence and the 
procedures necessary for signing up. A single 
university may have many different ways to 
make a network terminal connection, from a 
simple telnet client to a more sophisticated or 
customized user interface developed for a par- 
ticular department or college. Typically, some- 
one finds out about one discussion group bv 
already being signed up on another. While 
direci.ories (of listservs, institutions, archives, 
bibliographies, people, etc.) exist, they are not 
commonly known. Finding them requires an 
existing level of knowledge about how to search 
the Internet. Such haphazardness of access and 
knowledge is a primitive constraint, keeping 
information from people who could benefit 
from it. 

1 he World Wide Web provides what might be 
called a general, external common interface for 
those who can access it. The menu functions 
of Mosaic or Netscape viewers are the same for 
anyone using the software. The important 
consideration, therefore, is the design of a spe- 
cific site, wh'u information it presents, and 
how it organized. 

1 he University of Virginias Institute for 
Advanced Technology in the Humanities, 
directed by John Merritt Unsworth, maiiuains 
one of the most advanced sites in humanities 
research. The inverface is simply and clearly 
organized; the content is rich and growing. 
lATH provider's an o inlet for the work of 
University of Virginia scholars, such as the 
nineteenth-century scholar and textual theorist 
Jerome MeCtann, who is constructing an 
archive of text, manuscript, and images by the 
poet and ar*^»st Dante (jabriel Rossetti. The his- 
torian Edward L. Ayers maintains a site in 
progress or. the C'ivil War, The Valley of the 
Shadow, file experimental video and computer 
artist David Blair is constructing an elaboiate 
MOO site for his Wax Web project. lA/'H also 
offers computing resources to a roster of fellows 
from other universities. In collaboration with 
North Carolina State University, lAEH edits 
and publi sites Oxford University Press’s 
h)swiodcni Culuor, one of the few schv>larly, 
refereed, online journals in the humaiiiues. 
lAfH, the most clearly focused site for exploit- 
ing hunu.nities content, manages, through a 
fairly simple and consistent use of l El'ML, to 
present a diverse set of issues in text editing, 



historical research, and film and cultural stud- 
ies. It makes use of plain text and multimedia 
tools and depends on a technologically aware 
cohort of scholars in the field to access and con- 
tribute to it. 

Electronic Text Cheniers, because of licensing and 
copyright restrictions, provide services that are 
often restricted to one unr'ersity communing 
T'hey have limited Internet and Web access that 
provide reference and lookup services (card cata- 
logs, and texts of the OED, Sliakespeare, and 
other literary works that can be searched). A few 
present graphical images of manuscripts. Much 
literature appears on the Internet — novels, poet- 
ry, and drama — but few texts are of dependable 
authenticit}'. It will be crucial for Electronic Text 
Centers, perhaps in conjunction with publishers, 
to create a body of authorized, searchable texts 
with access mechanisms universally available. 
Centers such as the Electronic Text Center of 
Universit)^ of Virginias Alderman Library and 
The Center for Electronic Texts in the 
Humanities (a joint project of Princeton and 
Rutgers universities, also associated with the 
Text Encoding Initiative) are helping to solve 
the matter of editorially dependable computer- 
accessible texts by undertaking major initiatives 
in digitizing manuscripts and creating authorita- 
tive texts using SGML. 

1 here are other, scattered, networked piojects. 

A recent Web site, established by the Univershy 
of Chicago and Notre Dame, exhibits manu- 
scripts by Dante. Among the art exhibits now 
proliferating, the best design remain.', that of the 
Web Louvre project by Nicolas Pioeb. The 
Getty Art History Information Programs 
Museum Educational Site Licensing Project, a 
multi-university initiative tg explore networked 
access to museum images, should help organize 
strategies and methods for lict .vorked access to 
ima^.es of art objects. 

TTic use of commercial software packages and, 
in some instances, the creation of original soft- 
ware, to produce information acce.ss programs, 
.mil ti media projects, and teaching modules in 
. the humanities are being developed and used by 
many scholars. However, they are often not 
widely known beyond the developer of the par- 
ticular discipline. Well-established program.s, 
such as the Max MIDI composition program, 
use C language in an object-oriented environ- 
ment to produce composition modules. T'he 
Perseus Project from Harvard uses C'D-RC^M 
for multimedia rcseartb in ancient (jtc ’k 
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history and literature. The Academic Software 
Development Group at Stanford University has 
developed Media Weaver, a distributed author- 
ing system for hypermedia and media streams 
both off- and online, which is currently being 
used for projects as diverse as Chaucers poetry 
and the history of Silicon Valley. 

Peter Donaldson, Director of MITs 
Shakespeare Multimedia Project, has developed 
a Mac-based interface that matches the 
Shakespearean text with moving images (from 
laserdiscs and Quicktime files) drawn from vari- 
ous filmed versions of the plays. The project 
allows students to compare different readings by 
different filmmakers and actors in ways that 
explain not only the text, but the varieties of 
cinematic interpretation as well. It offers inter- 
esting interactive potential by allowing students 
to grab and arrange still images onto their own 
notepad windows. 

Cinema would seem a natural subject for com- 
puter-accessed study. A number of scholars are 
exploring ways of digitizing moving images and 
interactively combining them with text. Some 
are using a computer interface with laserdisc to 
create analyses of a single film (such as The 
Rebecca Project, by Lauren Rabinovitz and 
Greg Easly .it Iowa, which analyzes the 
Hitchcock film from a number of critical and 
historical perspectives). Others, such as Robert 
Kolker, one of the authors of this paper, and 
Stephen Mamber at UCLA, are experimenting 
with critical essays using moving images pub- 
lished on the Web and multimedia explorations 
of the basic cinematic vocabulary, using digi- 
tized clips and authored in Asymetrix Toolbook. 

Language learning and linguistics are fields of 
major exploration; a number of Interactive proj- 
ects arc based on both existing and new software. 
Asian languages have received special attention in 
multimedia teaching programs. Ohio State 
University' and the University of Maryland’s 
University (College are creating a multimedia ver- 
sion of a standard Japanese textbook in a project 
funded by the Anncnberg-CPR Project. 

Linguistics scholars arc developing computer- 
assisted principle-based parsers (which can give 
structural descriptions of sentences in more 
than one language). A database of children’s 
spontaneous speech known as C^HILDES has 
been developed at Carncgic-Mellon University 
with NSF funding. 



FUTURE RESEARCH NEEDS 
'This very brief and selective survey indicates the 
plurality of tools and content developed in 
computer-based and computer-assisted humani- 
ties research. What it does not reveal arc the 
intense efforts now being made, and yet to be 
undertaken, to bring this work to students. 

Many projects arc made for student interaction, 
but interface designs arc as diverse as the proj- 
ects themselves — requiring skills specific to the 
project — and student access to computer facili- 
ties is far from universal. A very few colleges 
and universities provide every incoming student 
with a computer. Others have developed com- 
puter lab facilities in which students can do 
their work. Relatively few have interactive com- 
puter teaching theaters where faculty and stu- 
dents can learn in an environment that allows 
close association between human and machine. 

The need for access to hardware is coupled with 
an even greater need for access to training. 

Major curricular issues are at stake if computer- 
aided research and pedagogy are to expand. 
Introductory courses in computation need to be 
developed for all students outside the usual 
computer science curriculum. Humanities fac- 
ulty members need to be trained in graphical 
environments so that they can enjoy access to 
existing humanities content and begin to take 
part in multimedia authoring. 

Work is needed on ways to bring the necessary 
training to humanities schol *rs that will 1 ) 
inform them of how comp* iters can aid their 
research; 2) make them c >mfortabIe with com- 
puter-based tools; and 3' identify and then 
encourage those who wi'h to do advanced work 
in creating tools for teaching iuid research. This 
eftort must be carried on concurrently with 
research into the kinds of interface design that 
would be best for most humanities users. 

A major barrier to these users is the lack of 
interface standards and the need for specialized 
skills to create and access content. As we said 
earlier, the work of the humanities is a diverse 
undertaking with multiple points of view and 
multiple content. U is an area of many special- 
izations, whose practi'ioners may not have i! i* 
skills or the time to devote to autho 'ing and 
progiamming. Barriers to access of a variety of 
computer applications need to be lowered. 
Standards for multimedia authoring and usable 
Interfoccs that can be easily modified to accom- 
modate a variety of content would be extremely 
useful. Sound, image, and video capture must be 
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simplified and standardized, as should the pro- 
grams for integrating them. Not all universities 
have software development units available to fac- 
ulty. In the absence of those, all interested facul- 
ty should be able to access simple, universal 
tools (for example, HTML and the World Wide 
Web). Stand-alone applications need to incorpo- 
rate a similar, even simpler, set of standards. 

Such work would ideally combine the talents of 
computer scientists, human^computer inter- 
action researchers, and humanities scholars 
developing content and tools. Once these tasks 
were accomplished, computer technology would 
facilitate the needs of the humanities and yet 
remain in harmony with the diverse, explorato- 
ry nature of work in the ^"amanities. 
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STATE OF THE ART 

The arts and humanities focus on the study of cultural objects. Knowledge in the arts and humanities 
can consist of cultural objects themselves^ information about those cultural objects, interpretive com- 
mentary on those objects, and links or relationships between them. The nature of the objects under 
study is so broad that the knowledge associated with them can be almost anything, and it can be used 
and reused for almost any purpose. For example, a text can be part of a large collection studied for lit- 
erary, linguistic, and historical purposes. The same tex'. can also be analyzed in very fine detail, per- 
haps even for the punctuation within it and for the physical characteristics of the original source. 



The representation of knowledge in electro lie format can itself take many forms, and it has taken 
more than forty years of work with electroi ic resources to begin to understand the potential and the 
perils of some of these formats. .erial consisting of text, numeric data, images, sound, and 

video now' exists in electronic form. At a fundamental level, all of these are represented in electronic 
form as bits, but it is the higher levels of representation (the forms into which the information is 
organized and the access points to those forms) that define how useful that electronic information 
might be. 

Early projects worked mostly with text, and the efforts of these projects show some of the possible 
pitfalls in choosing how to represent information. These projects attempted to transcribe electronic 
text by maintaining as accurate a reproduction of the source as possible. Typographic features such as 
italic type and footnotes were copied faithfully, making an explicit representation in a different medi- 
um (electronic form) of a property of the original medium (print). Typographic features aid the read- 
ing process the human performs, but they are ambiguous and so are less suitable to aid any- 
processing done by a machine. It took many years to begin to understand some of the differences 
between representing knowledge that is intended only to be read, and representing knowledge that 
can be processed electronically in different ways. 



Much of our knowledge about objects or information is implicit in some way or another. We know 
that the text along the cop of a page is a running heading because that is where a heading is normally 
placed. We can deduce the context or scene depicted in a painting because we know, for example, 
that the figures shown appear in a particular biblical story in that context. When we see a film clip 
wc can recognize the place where the action is happening or detect a foreign accent in one of the 
speakers. When we browse a dictionary we know that the item in boldface type at the beginning of 
an entry is the headword. But when we start to manipulate any of these items electronically this lack 
of specificity becomes apparent and contextual or other information is needed. 




The question then arises of what know'ledge should be stored to provide this explicit information. In 
very many information systems, the representation of knowledge is tied up in some way with partic- 
ular data structures. Early systems stored databases as “flat files" or single rabies with one set of rows 
and columns, which inevitably meant some restructuring of data before it was entered into the com- 
puter to avoid repetitions and to deal with anomalies. Historians and archaeologists commonly com- 
plained that this led to simplification of the material. Relational databases provide a more 
sophisticated data model, but can also suffer from some problems. Not all humanities data fits neatly 
into sets of tables without some loss of information. Furthermore, the relationships between the 
items of data need to be defined when the database is initially set up, yet many collections of 
humanities material arc put into electronic form in order to do research that will help to establish 
the relationships between the items in the collection. 
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In many current systems the representation of 
knowledge depends on specific software pro- 
grams. When items or objects are indexed and 
access to them is only via special-purpose soft- 
ware that can read those indexes, some of the 
knowledge becomes dependent on the software 
and is derived through functions of the soft- 
ware. In some cases it is not even possible to 
extract the information in the format in which 
it was entered. Moreover, knowledge that has 
been created for a specific program or type of 
computer is less likely to last for a long time. 
Even if it can be converted easily from one pro- 
gram to another, something may be lost during 
the conversion, or a different theoretical orien- 
tation may be imposed on the material. 

Meta-data, or knowledge about the knowledge, 
is another way of making implicit information 
more explicit. Some communities recognized 
the importance of meta-data early on: for exam- 
ple, bibliographic and cataloging data is still 
fundamentally a means of using electronic 
means to describe material that is mostly not in 
electronic form at present. In the 1970s the 
social science data archiving community created 
a system for describing its datasets, and these 
codebooks are almost universally accepted as an 
essential part of a dataset. Initially created in 
print form, some arc now being converted into 
electronic form. Meta-data for electronic textual 
material is in a much more rudimentary form at 
present, and very few electronic texts have what 
would now be considered adequate information 
associated with them. Our understanding of the 
meta-data requirements for images, sound, and 
video E.gs even further behind. 

CURRENT RESEARCH 
Research during the last ten years has concen- 
trated on establishing ways of storing knowl- 
edge in electronic form so that it does not 
become obsolete, so that it can be reused for 
different purposes, and so that it is separate 
from any software that will process it. The 
Standard Generalized Markup Language 
(SGML) j)rovides a way of representing infor- 
mation that is independent of any particular 
hardware or software. For text it consists of 
plain ASCII files that can be transmitted across 
any platform and network. SGML is object-ori- 
ented. It does not say anything about what will 
happen to those objects when they are 
processed electronically: it merely says what 
they are. Thus different processing programs 
can operate on the same SGML data. An added 
benefit of using SGML is the ability to defer 



making many decisions which might otherwise 
have to be made at the start of a project, and 
which are often regretted later. 

SCjML can be used to describe anything. 
Although principally text-oriented, it does not 
have to work only with text. It can be used for 
fhe textual information that must accompany 
images, sound, and video in order for them to 
be useful. SGML is not itself an encoding 
scheme; it is a meta-language within which 
encoding schemes (SGML tag sets) can be 
defined. I’he Text Encoding Initiative (TEI), a 
major international project in humanities com- 
puting and the language industries, has created 
an SGML tag set suitable for many different 
applications. Using a modular document struc- 
ture, the TEI can be used to represent many 
different text types and many different theoreti- 
cal orientations. It has tags for the structural 
components of many text types, and also 
includes tags for analytic and interpretive infor- 
mation as well. It also has a set of tags which 
provide an electronic text file header that 
includes meta-data of various kinds. Another 
humanitics-related SGML application s the 
Finding Aids project at Berkeley. 

The acceptance of SGML is now widespread for 
commercial as well as academic applications. Its 
focus on content is appealing, especially when it 
is not po.ssiblc to define all the likely functions 
that can be performed on an electronic text at 
the start of a project. For text it also enables the 
meta-data to be encoded using the same syntax 
as the text itself, which is attractive for process- 
ing purposes. SGML software is now becoming 
much more widely available, and the recent 
announcement by Novell of an SGML Edition 
of WordPerfect 6.1 should help to put SGML 
in mainstream computing. However, SGML 
basically assumes a single hierarchical structure 
for a document. Most humanities material has 
multiple parallel hierarchies, or can even be 
viewed as webs of information. Efforts to repre- 
sent these in the current version of SGML are 
clumsy, since almost all SGML sofware 
assumes a single tree structure for processing. 

'The Hypertext Markup Language (HTML.) used 
by the World Wide Web has perhaps done more 
than anything to raise awareness of structured 
text. Even if it docs not survive, it will leave a 
large legacy of text marked up in an SGML-like 
way. I'hc World Wide Web has also enabled 
many more people to be aware of network-wide 
resources in different forms and of the possibility 
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of linking or pointing to information stored else- 
where on the network. However, the current ver- 
sion of HTML docs have limitations in the kinds 
of material that it can represent, and its encoding 
tags arc mostly presentational. Its meta-data 
capabilities are also weak. 

Alternative approaches to representing text 
focus more on the appearance of a document. 
This means that the document is easy to read, 
but the method is less suitable for long-term 
storage of material that could be used for many 
different applications. 



'Lhc need to represent missing or incomplete 
information in some way is now reasonably well 
accepted. In some cases it may be important to 
distinguish between information that does not 
exist in any way and information that can exist 
but is not known for this particular instance. 
'The level of certainty about information in arts 
and humanities data can iilso be critical, aad it 
is useful to give an indication of this. Similarly, 
it can be helpful to record who is responsible 
for decisions about uncertain information or 
other encoding, and their role in making rhose 
decisions. 



A multiplicity of so-called “standards” exist at 
present for storing images, sound, and video. 
Conversion from one to another is usually pos- 
sible, perhaps with some loss of information. 
Some work has been done in the area of meta- 
data associated with these formats, but in gener- 
al this consists of moving information from one 
system to another in such a way that it can be 
processed (as opposed to merely being viewed 
or heard). Size is still a constraint for these types 
of data, and much effort is of necessity being 
concentrated on compression techniques for 
storage and transmission rather than on repre- 
sentation of the information itself 

A number of other representational issues are 
important for arts and humanities material. 
Non-standard characters appear regularly. There 
are many different ways of dealing with these, 
most of which are incompatible with each other 
or are functions of specific software programs. 

In some cases the writing system and the lan- 
guage arc treated as the same thing, although 
only rarely do they have a one-to-one relation- 
ship. SCjML offers some general-purpose solu- 
tions, but these do not appear to be very well 
implemented at present, and barely at all on the 
World Wide Web. Dates can be in different cal- 
endar systems or can be vague forms like 
“Hellenistic,” but they need to be represented 
in ways that enable them to be put into chrono- 
logical order. Similar problems arise with 
weights and measures, where the units can vary 
from one culture to another. Names and their 
relationship to individuals who bear them can 
also be important, fhe same name, referring to 
the same person, can be spelled in different 
ways. There may also be several individuals with 
the same name in a collection of material, giv- 
ing rise in some cases to doubt about whether it 
is the same person or not. 




CRITICAL AREAS FOR FURTHER 
RESEARCH 

Much electronic information in use today has 
been created with die aim of making a surro- 
gate of something that already exists in another 
format. Many of the functions performed on 
thar information arc the same as those that 
might be performed on the original: reading, 
viewing, etc. The electronic environment facili- 
tates other types of processing and analysis. 

Some of these, like statistical analysis of social 
science data or text retrieval, arc fairly well 
understood. Others have been barely thought of 
as yet, but future scholars will probably want to 
subject electronic information being created 
today to new and different forms of processing. 
CLiining a better understanding of the full 
potential of the electronic medium ought to 
help us create better and more useful represen- 
tations of material in electronic form. 

Electronic information is mutable and dynamic: 
changes can be made to it at any time. Tracing 
those changes becomes important for future 
users, but we do not yet have a universally rec- 
ognized way of recording these. For text we no 
longer need to write in a single linear stream, 
stored on rectangular objects like those on 
which we have written for centuries. We are 
already seeing hypertext fiction in which the 
novel has no obvii/us beginning, middle, or 
end. This is still in an experimental stage, but 
we can envisage hypertext writing of scholarly 
papers in which differing arguments or inter- 
pretations are presented in parallel as hypertext 
links rather than as a single stream of text. 
Raising awareness of the potential of the elec- 
tronic medium may thus also help us to create 
better representations of information. 

Cairrent methods of recording meta-data seem 
to be concentrating on the properties of the 
origina. from which the electronic surrogate is 
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made (for example, indexing terms, traditional 
catalog records, and the like). Yet the properties 
of the electronic version can also be important. 
The TEI header is one of the very few attempts 
to provide meta-data that records the process of 
creating the surrogate. It includes information 
about the transcription of the text, whether 
spelling has been normalized, and the treatment 
of potentially ambiguous characters like the 
period. A similar model might be needed for 
other types of data. Current methods of record- 
ing meta-data also seem to be intended mainly 
for humans to use, but it is likely that in the 
future they will be read and acted on just as 
much by computer programs; further research is 
needed to establish exactly how this might work 
and what kinds of interoperability^ are possible 
between meia-data systems. 

With the World Wide Web, we have a glimpse 
of the potential of a global network of linked 
resources, where linking mechanisms are likely 
to become more and more important. In some 
ways they arc fundamental to much work in the 
humanities, which is about making connections 
between items of information and associating 
some intellectual rationale with those connec- 
tions. At a more practical level, we need ways of 
linking transcriptions of text to the digital 
image of the source ar a fine level of granularity, 
and of linking areas of an image to descriptive 
information about those areas. In most current 
systems links are software dependent and can 
only be created and accessed via that software, 
liy'rime, the SGML application for hyper- 
media time-based information, provides one 
method of software-independent linking. The 
rLl Guidelines incorporate a set of linking 
mechanisms modeled on those in HyTime. 

Both of these have been little used so for 
because of a lack of suitable software. More 
research needs to be done to determine how 
effective and how usable they arc. 

Making a link between two or more items 
implies that a relationship exists between them. 
‘Flu; reason for the link is important, and what 
is needed is a method of representing that rea- 
son as well as a way of saying who created the 
link It may be that confVictiitg reasons exist, in 
which case all need to be represented without 
one being privileged all the time. Pointers can 
be multi-headed, in which case all pointers 
leading from a single item ought to be docu- 
mented. Links iK-ed to made from a single 
point or span of information to another single- 
point or span of information. 



Representing what can be referred to here as 
“derived knowledge” is also likely to become 
more important. Derived knowledge is the 
result of some processing of electronic infor- 
mation (for example, some form of linguistic 
analysis, or image processing). It may be that, in 
the current state of our software, such a proces- 
sing program is not entirely accurate (for exam- 
ple, a word-class tagger giving about 96 percent 
accuracy), but the processing may take a long 
time and yield results worth keeping. Ways 
must be found to associate this with the original 
material, which also enables the derived knowl- 
edge to be updated and amended both auto- 
matically and manually. 

For the more immediate future, ways of repre- 
senting some kinds of source material must be 
developed further to bring them up to the level 
that already exists for other types. Current 
methods of encoding newspapers, papyri, 
inscriptions, text on vases and other artifacts, 
early printed books, spoken texts, and historical 
dictionaries are acknowledged to be primitive at 
present. Linguistic information will become 
increasingly important as w'e look toward better 
retrieval methodologies, and the multilingual 
aspects of this are very relevant for the arts and 
humanities. Another area of direct concern at 
present is what to do about the large amounts 
of “legacy” data that is already in electronic 
form, but represented in a way that is now 
acknowledged to be deficient. Research is need- 
ed to perform more “intelligent” conversions 
that can begin to handle at least .some of the 
incomplete representation of the original in the 
electronic source. 

Cost factors also need to be examined in more 
detail. Given the high cost of creating electronic 
resources, it seems important to represent the 
information in such a way that it can be used 
for many different purposes. 'Lhe scheme ought 
also to be incremental, thus enabling new 
knowledge to be added to already existing infor- 
mation. In the arts and humanities, the quality 
of the information is also extremely important. 
People arc often unwilling to use material that 
is perceived to be inferior in quality to the orig- 
inal. Electronic texts that have obvious typo- 
graphical errors have been heavily criticized, as 
have low-resolution images in which the detail 
cannot be easily seen. Research is needed to 
determine what is the minimum level of quality 
accept. able to most users, what arc the circunt- 
stances where a very high level is essential, and 
what are the relative costs associated with this. 
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Electronic technology has begun to change what information is available [6] and how that informa- 
tion is located and used. The first of such changes are related to remote access: Instead of traveling to 
the sources of information, scholars use technology to bring information to them. One important 
consequence of remote access is the broadening of access to students and other novices who would 
not or could not bear the time and financial costs to travel to libraries, museums, and research insti- 
tutes, and who might not know what to look for once they arrived. Second, electronic technology 
brings new genres of information that provide new challenges for search and discovery (e.g., multi- 
media, interactive ephemera, etc.). Electronic technology exacerbates the traditional problems 
humanities scholars have found in documenting and locating non-textual materials. Third, change is 
due to electronic tools and the strategies that electronic representations made possible. The emphasis 
here is on tools and strategies for resource search and discovery, although we will continue to see 
closer integration with tools and strategies for creating, using, and communicating information. Such 
developments imply that creators who choose to become more closely involved with consumers must 
take more responsibility for documenting their work and making it accessible. 

In archives, libraries, and museums, search and discovery are facilitated by finding aids, catalogs, an 1 
guides that organize the information space for information seekers. Similar devices are appearing for 
electronic resources as well. An ongoing research challenge is to discover appropriate representation . 
for information and new search and discovery tools and strategies that leverage the strengths of com- 
puters and telecommunications networks. 

Search implies an effort to locate a known object; the information seeker has in mind specific char- 
acteristics or properties of the object, which are used to specify and guide search activity. Discovery 
implies an effort to explore some promising space for underspecified or unknown objects; the infor- 
mation seeker has in mind general characteristics or properties that outline an information space in 
which perceptual and cognitive powers are leveraged to examine candidate objects (elsewhere [10] I 
have distinguished search and discovery as analytical and browsing information seeking strategies, 
respectively). In general, discovery emphasizes the location of the promising space, such as a collec- 
tion or resource (e.g., [2]). Electronic technology provides new tools for each of these classes of 
strategies and also blurs the traditional boundaries betw'een them. 

STATE OF THE ART 

Scholarly search and discovery depend on mappings between conceptual space and physical locations. 
Classification systems organize information objects, thesauri map these organizations onto word 
labels, and catalogs provide pointers from labels to physical objects. Traditionally, there have been 
clear demarcations between n-ary information objects such as indexes and catalogs, and primary infor- 
mation objects such as books and physical artifacts. The Internet includes n-ary and primary informa- 
tion objects, and todays interfaces make little distinction betv/een these representations, effectively 
blurring these boundaries. Thus, electronic technology influences information seeking by changing 
both the traditional tools that support search and the strategics used for information seeking. Any 
attempts to develop cataloging schemes for Internet resources must not only take into account these 
differences but also address the difficulty of documenting dynamic and ephemeral information objects 
such as ftp and Web sites. It is certainly too soon and probably wrong to aim at developing collection 
development policies and a master catalog for the Internet as a whole. Nonetheless, specific digital 
libraries and resource collections have begun to take advantage of information retrieval and informa- 
tion-seeking research to make information more easily and rcac ily available. 
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Search 

Information retrieval research has yielded sever- 
al approaches to the problem of matching 
queries to documents and object surrogates. 
These approaches have traditionally been 
applied to specific collections of documents 
(one set of resources) rather than across many 
different collections. The most basic advantage 
of text in electronic form is the ability to do 
string search — to locate all occurrences of a 
string of characters in ? text or corpus. 

Although many algorithms support string 
search, inverted file indexes are used in most 
large-scale systems to support free- text search- 
ing. Building on string search techniques, schol- 
ars are able to develop concordances (e.g., the 
Dead Sea Scrolls) and explore word usage fre- 
quencies across authors or works (e.g., 

I'hesaurus Linguae Graecae with Pandora). 
Although many of these efforts are currently 
restricted to stand-alone, proprietary collec- 
tions, some are available through the Internet. 
There has been little progress in indexing non- 
textual materials, although scene changes and 
color patterns have been used to augment video 
and graphical databases. Most non-textual 
objects are located through textual descriptions 
or linear scanning. 

Another major development in search is the 
ability to rank documents according to one of 
many statistical or probabilistic algorithms that 
use word or phrase frequency data to match 
queries with documents and rank results accord- 
ingly. Although such activities are computation- 
ally intensive, today's computers are able to 
manage representations of documents as n- 
dimensional vectors and compute similarity 
measures for documents and queries in n- 
dimensional space. I'hese approaches have 
gained commercial appeal (e.g., Dialog s Target 
and Lexis/Nexis Freestyle); many Internet 
resources are now using statistical or probabilis- 
tic search engines on their servers (e.g., several 
WAlS-based services are available; the Library of 
Congress 'fhomas system uses the Inquiry search 
engine). In most cases these approaches provide 
kc)^ord access (based on all words in the corpus 
except some small set of common words) rather 
than subject access (based on a carefully con- 
structed controlled vocabulary used by indexers 
to describe the content of the object). Although 
ranked retrieval offers good advantages to novice 
searchers and a viable alternative to Boolean- 
based search for experienced searchers, we arc a 
long way from providing all and only relevant 



information to information seekers who pose 
word-based queries.'* 

A third set of approaches to searching leverages 
the logic of discourse or substantial knowledge 
bases to contextualize queries or to possibly 
modify them. For example, the Perseus system 
[4] includes a morphological analyzer that goes 
beyond string search to provide variant forms 
for Greek words. Some linguists aim to develop 
generic grammars that represent the domain of 
possible logical statements and parsing routines 
that map natural language queries and docu- 
ments onto the grammar. Other researchers 
have developed schemes for taking advantage of 
meta-knowledge provided by autliors or pub- 
lishing specialists. For example, the Text 
Encoding Initiative (see Hockey paper in this 
collection) promotes the use of SGML coding 
in scholarly texts so that information seekers 
can use these codes for locating and analyzing 
texts. Another line of research aims to develop 
thesauri (e.g., the An & Architecture Thesaurus) 
that provide controlled entry points for infor- 
mation seekers as they formulate queries or that 
arc applied automatically to modify or expand 
queries during the retrieval process. Proficient 
searchers can certainly use a thesaurus to good 
advantage, but automatic query expansion 
based on a thesaurus has not generally yielded 
improved search results (e.g., [8] and [12]). 

A fourth class of research aims to develop filter- 
ing systems that automatically route potentially 
relevant information to scholars. Search 
depends on specification of the sought object 
and filtering depends on specification of the 
user. Libraries have traditionally selectively dis- 
seminated information to scholars, devoting 
human effort to scan information services 
according to institutional and individual inter- 
est profiles. Online services allov^ users to define 
interest profiles (usually word based), then alert 
them when information objects arrive that fit 
the profile (e.g., document delivery services 
such as UnCover). Different implementations 
may use any combination of the search algo- 
rithms above. On the Internet, several network 
news filters adapt as users provide positive and 
negative feedback, and there are programs of 
research to develop active agents that roam the 
nersvork to locate profile-appropriate informa- 
tion and sometimes cooperate with other soft- 
ware agents.'" 
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Finally, some research has attempted to auto- 
mate traditional reference and quecticn answer- 
ing services. Early efforts used expert-system 
technology to automate selected reference ser- 
vices; todays efforts aim to go beyond the sim- 
ple frequently asked question (FAQ) services to 
develop multiple tiers of online reference sup- 
port (e.g., 11]). 

Discovery 

Browsing has many attractions for scholars: 
exploration, contextualization, and serendipity 
support the discovery of new connections 
between known ideas as well as pertinent new 
informational resources. In manual environ- 
ments, browsing has been done in specific col- 
lections (e.g., a section of shelves). Electronic 
technolog)' in general and the Internet in par- 
ticular has greatly expanded the universe of 
browsable material by bringing it to the infor- 
mation seeker at the desktop. Because the 
Internet connects a multitude of collections (on 
all topics, in various media, and using different 
organizational schemes), discovery has become 
complicated by the need to first limit browsing 
to a set of resources. Developing tools and 
strategies for identifying resources to browse 
this wealth is thus a primary research challenge. 

One form of guided discovery is exemplified by 
hypertext systems. Most hypertexts use explicit 
links denoted by link anchors (buttons, high- 
lighted text) to suggest routes for users to fol- 
low. In stand-alone hypertext systems (i.e., 
specific collections), users can navigate effective- 
ly by following explicit links. Many scholars 
consider such links to be editorial acts; thus 
aggregations of existing materials woven togeth- 
er with hypertext links represent added-value 
derivative works at least and original scholarly 
interpretations at best. The immense popularity 
of the World Wide Web is based on the ease 
with which users can follow hypertext links 
with public-domain and easy-to-use client soft- 
ware often called browsers (e.g., Mosaic, 
Netscape). Hypermedia systems such as Perseus 
and Piero press the links further by offering 
implicit or computed liriks that arc made avail- 
able as the results of queries entered by the user. 
Electronic texts that use SGML or other 
markup codes can also offer on-the-fly link con- 
structions that allow information seekers to fol- 
low paths defined by their articulated needs 
rather than predefined links provided by 
authors or editors. Other approaches include 
dependencies based on system state (e.g., Petri 
nets) and scripts that compute links based on 



user behavior. Even after users h;we limited 
their discovery to a set of pertinent resources, 
personal discipline is required to remain within 
that set (e.g., todays browsers do not dynami- 
cally limit links to the sites contained in a pre- 
liminary selection of resources). 

Discover)' depends on both locating candidate 
objects and recognizing relationship(s) between 
them and the problem under investigation. The 
interplay between the perceptual aspects of 
browsing and the cognitive aspects of reflection 
and evaluation is best supported by systems that 
present accurate and well-documented represen- 
tations (i.e., authors or their agents are explicit 
about their perspective) for objects and allow 
users rapid and precise control. Direct manipu- 
lation interfaces (see Kolker and Shneiderman 
paper in this collection) best illustrate such 
interfaces in computing environments. 
Developments such as the use of thumbnail 
images as well as text-based descriptions provide 
new types of surrogates for information objects 
and support rapid scanning and browsing. 
Multiple levels of representation for texts are 
emerging in nerworked environments as users 
move from the entire Internet to a subset (pos- 
sibly ranked) of resource titles to outlines or 
tables of contents for specific objects to extracts 
from the objects, to the full representation of 
the object, and eventually to related objects. 

Integration of Search and Discovery 
Because electronic environments are blurring 
demarcations between search and discover) 
strategies, several developments suggest research 
directions. First, one way to improve the results 
of a search is to use relevance feedback. Given a 
set of objects retrieved for a query, users may be 
able to identify which are appropriate to the 
need and which arc not. These judgments arc 
fed back to the system and the original query is 
cither modified or a new query is formulated 
that combines the original query with the addi- 
tional information gained through feedback. 
Relevance feedback illustrates the linkage 
between search and discovery — a search query 
serves to identify an intellectual neighborhood 
for the information to examine (often by brows- 
ing), and the results of the examination arc used 
to refine the neighborhood. This process mirrors 
what information seekers do in manual environ- 
ments, but the computational tools multiply the 
number of iterations possible per unit time. Just 
as rapidly displayed, coordinated still images 
become moving pictures beyond thresholds of 
10 to 15 frames per second, this quantiuitive 
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increase may lead to qualitative shifts in search 
and discovery. One possible avenue of develop- 
ment in this regard is hierarchical (cascading) 
dynamic query systems. 

Another development that improves search and 
discovery on the Internet is the use of indexing 
programs called spiders or robc*-^ that systemati- 
cally link to Web sites, note whether the site has 
previously been visited, and record basic meta- 
data about each site (sites may also contribute 
indexing information voluntarily). These pro- 
grams have made the Web somewhat searchable 
without constraining the browsing features of 
servers or clients. It is important to note that 
these services do not really represent a catalog of 
the Internet but rather a listing of home page 
words. Additionally, to avoid tying up network 
resources, spiders do not traverse all links in a 
site (thus a more substantive image of what the 
site contains and to what it links is not avail- 
able). Another system* provides full-text 
retrieval but also allows searches on SGML tags 
and supports multilingual searchii g. Another 
approach is illustrated by the Harv. st project,'"' 
which separates the indexing gatherers from the 
indexes themselves (brokers). This allows multi- 
ple and customized indexes to be tailored for 
specific communities. 

The most important illustrations of integration 
are the developments in interactive interfaces 
that closely couple search, evaluation, and re- 
formulation. Dynamic queries, fisheye views, 
semantic maps, and other visualization mecha- 
nisms illustrate such integration. The quality of 
electronic display continues to improve as fonts, 
backgrounds, color, and resolution continue to 
offer more accurate representations for paper 
documents and other information objects (see 
Kolker and Shneiderman). One project that 
tightly couples textual information and graphi- 
cal information is the Piero project [9], where 
relational database entities are linked to a three- 
dimensional visual database, allowing users to 
search and discover textile or visually. 

('hallcngcs in the Humanities 
Although the research and development trends 
discussed above are applied in all domains, the 
humanities offers special challenges for search 
and discovery. First, the humanities celebrate 
individuality; information resources take many 
forms, and scholars often resist the imposition 
of standards. These effects arc most apparent in 
word-based searching, which is complicated by 
the opposing concerns of creators who endeavor 



to find uni(]ue and figurative language (whether 
the language of expression is textual, aural, or 
visual) and searchers who endeavor to map their 
needs onto language. Asking authors to use 
standard language is ludicrous, so it remains for 
editors, librarians, curators, and other informa- 
tion specialists to create customized indexes and 
guides to the literature. Furthermore, individu- 
ality leads to the creation of many fairly small 
corpora specific to individual scholars rather 
than few huge collections created by large com- 
munities of scientists (e.g., the databases of the 
Human Genome Project, Earth Observation 
System databases). Thus, in the humanities, it is 
especially critical to create and maintain special- 
ized and multiple indexes. 

Second, information resources in the humani- 
ties are less sensitive to time than resources in 
the sciences; although some searching in the 
humanities may be limited by period, the tem- 
poral range is typically wide. Thus, finding aids 
and interfaces may not be able to easily leverage 
time constraints. Also, these indexes and guides 
themselves must evolve as word usage evolves 
over time. 

Third, humanities resources are often multi- 
lingual. Individual works may use expressions 
from multiple languages, and resources related 
to a topic or artist may be available in multiple 
languages. Since English is a de facto standard 
for science and technology, most of the discov- 
ery tools are specific to English (although statis- 
tical retrieval techniques such as latent semantic 
indexing and n-gram analyses (e.g., [3]) have 
proved generalizable across multiple languages). 
Machine translation research that uses an inter- 
lingual language (e.g., [5]) may al > prove use- 
ful for indexing multilingual corpuses. 

Fourth, data acquisition and digitalization are 
expensive and time-consuming. Simply adopting 
a controlled vocabulary such as the AAT is a sig- 
nificant change for cataloging new acquisitions, 
but the retrospective conversion of local cata- 
loging records is intellectually challenging (and 
controversial) as well as expensive. Also, digitiz- 
ing text is challenge enough, but much of the 
content of the humanities is graphical, aural, 
and three-dimensional. Capturing and storing 
images or sound at high resolutions is both 
time-consuming and open to criticism vis-a-vis 
interpretiveness. Furthermore, the compvcssion 
scheme used will determine or limit what surro- 
gates can be made available for browsing. 
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RESEARCH NEEDS AND DIRECTIONS 
The special challenges the humanities offer for 
search and discovery research and the continued 
evolution of the Internet suggest several themes 
for future research and development. 

Muhiplc Approaches 

Because humanities scholars typically do not 
look for answers to well-defined questions but 
rather elaborate threads of discourse, traditional 
database techniques will not suffice. Humanities 
scholars and communities need to create and 
share thematic indexes specific to their own 
interests and expertise. The metaphor of self- 
organizing systems — many minds creating entry 
points for search and discovery — seems more 
appropriate both for a worldwide network of 
information and for the spirit of the humanities 
than the top-down metaphor of one great mind 
(or committee) that provides an organizational 
framework for some master index. Because it is 
in their personal interest to create such thematic 
indexes, humanities scholars will do so without 
funding (funding will speed up this process). 
There are, however, two crucial needs for 
research support in this regard. 

First, we must learn how aggregate thematic 
indexes and forge among them links that are 
activated according to the ontological perspec- 
tive of the information seeker (this may be 
thought of as a kind of intellectual interoper- 
ability). Thus, information seekers can specify a 
school of thought and be given sets of links cus- 
tomized to that perspective. Another user with a 
different perspective would find a different set 
of links for the same corpus. Research in the- 
saurus merging ([7]), scheme merging ([1 1]), 
and ontology definition ([13]) may eventually 
be helpful here. 

Second, scholars should be encouraged to create 
pathfindeis: guides to themes or topics that give 
not only give pointers to information resources 
but also critical commentary and interpretations 
about those resources. Since it is likely that we 
will see the continued development of indepen- 
dent, non-standard collections of information — 
each a uniquely organized expression 
celebrating human innovation and creativity — 
it makes sense that these collections themselves 
should become subject to study, critique, and 
interpretation. Thus, the purposeful aggregation 
and added-value commentary that define 
pathfinders in the humanities represent a form 
of scholarship that deserves directed research 
attention. Commentaries have long been part of 



scholarly practice in the humanities, but elec- 
tronic environments provide new possibilities 
for creating critical threads through the elec- 
tronic morass that themselves may include 
interactive aspects; e.g., using a pathfinder a 
second time will be different since it will take 
advantage of knowled< e about what you have 
already examined. How this knowledge is used 
requires creative and scholarly decisions on the 
part of the creator of the pathfinder. 

Because Internet resources will be available to a 
broad range of users, from children to seasoned 
scholars, there must be simple as well as power- 
ful tools for search and discovery. Although 
these are not mutually exclusive requisites, there 
is a need for developments of progressively 
powerful tools as well as tools tuned to specific 
types of users (see Murray paper in this collec- 
tion). A related need is for systems that provide 
multilingual interfaces as well as search and dis- 
covery tools that handle multilingual corpuses. 
Both of these needs have positive implications 
for the humanities, since they will lead to new 
classes and groups of users. 

Other Needs 

Clearly, more materials in the humanities need 
to be transferred to electronic form (see Kenney 
paper in this collection). Especially for text- 
based fields, techniques for automatically cate- 
gorizing and summarizing text fragments will 
be necessary if information seekers are to maxi- 
mize their time and memoty resources when 
examining and scanning candidate texts. It 
seems prudent to look for ways to combine sta- 
tistical approaches with knowledge-based 
approaches. For image-based fields, techniques 
to extract and match patterns must be com- 
bined with whatever word-based information is 
available (see Romer paper in this collection). 
Regardless of the medium (text, audio, images), 
interface mechanisms that allow rapid scanning 
(e.g., zooming and panning; fast-forward, mul- 
tiple display panels, etc.) are essential lo an inte- 
grated search and discovery environment. 

Finally, scholars must consider their audiences 
both during and after the creation of their 
work. First, during creation, the work can be 
tailored to make it easier for the audience to 
find it. On the crass side, this is advertising 
before art; on the scholarly side, this is tailoring 
expression to be best understood by ones pub- 
lic. Second, after creation, the scholar can point 
the work at audiences. This is what publishers 
currently do, but a networked world allows 
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creators to broadcast or narrowcast as they 
please. This closer link between creators and 
consumers depends on ihe development of tools 
that support creation, communication, and 
maintenance of digital work. (We can imagine 
next iterations of hypertext authoring systems 
such as Storyspace that automatically generate 
HTML and browser scripts that monitor usage 
statistics for automatic (or random) mutations 
or author version control.) Surely, tools will 
emerge that allow creators to produce viral 
works that change depending on use (or alter- 
natively, appear in different forms in different 
environments). Persistence and stability enable 
static indexing and locational aids to work in 
todays libraries. We need research to determine 
how to document, find, and use new genres of 
interactive and evolving intellectual products. 
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Notes 

' The auihor wishes to thank David Bearnian, 
Gregory Crane, Elli Mylonas, and Michael 
Neuman for comments on a previous version of 
this essay. 

" See the Center for Intelligent Information 

Retrieval Web site. For information on Inquiry see 
http://ciir.cs.umass.edu/. 

See the Card web site for a set of pointers to fil- 
tering research; 

http://www.enec.umd.edu/medlab/filter/. 

For example, Lycos [http://l\cos.cs.cmu.cdu/I and 
Yahoo [http://\sww.yahoo.eom/] services allow 
simple word searching on several million Web 
bites; Yahoo provides a simple classification system 
for limiting searching. 

' OpenTcxt 

|lutp://www.()pentext.eom:8080/oimv.htmll 

http://harvesf.es.colorado.edu/ 
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STATE OF THE ART 

This paper will focus on the conversion into electronic form of traditional source materials, includ- 
ing books, journals, manuscripts, graphic materials, and photographs, which serve as the primary 
documentation for research in the arts and humanities. Although it acknowledges other means for 
electronic conversion, this paper will emphasize the use of imaging technology to produce digital 
surrogates for paper- and film-based sources. 



Digital images are “electronic photographs” that can accurately render not only the information con- 
tained in original documents, but also their layout and presentation, including typeface, annotations, 
and illustrations. High fidelity to the source material cm be obtained in digital images, which can be 
displayed on-screen or used to produce paper and film copies, or transmitted over networks to 
researchers around the world. The main drawback to digital images today is that they are “dumb” 
files, not data that can be manipulated (for example, searched and indexed). 

Efforts to convert materials originally created in print form to machine-readable formats have been 
ongoing for nearly half a century, but the major thrust for arts and humanities research began in the 
1970s when important sources in linguistics, classics, religion, and history were converted to elec- 
tronic texts. The Thesaurus Linguae Graecae (TLG), begun in 1972, was the first significant 
American conversion effort, and since then a growing number of institutions have initiated major 
projects to create compurer-processible electronic texts. The Center for Electronic Texts in the 
Humanities (CETH), established by Rutgers and Princeton in 1991, maintains an inventory of exist- 
ing electronic texts (available through RUN, the Research Libraries Information Network) and pro- 
vides summer seminars on setting up and managing electronic text centers. 

Such efforts have not sought to replace source documents but to create electronic transcriptions of 
texts for quantitative and qualitative analysis. The creation of electronic texts has expanded and 
matured with the development of standardized approaches and common protocols such as the Text 
Encoding Initiative (TEI), a collaborative effort to define means for encoding machine- readable text 
that would make electronic exchange feasible; and the widespread adoption of ISO 8879, Standard 
Generalized Markup Language (SGML), a standard set of instructions for composing structured 
machine-readable documents that encodes the function rather than the appearance of elements with- 
in a document. Notable current efforts in the use of such encoding may be seen on the World Wide 
Web, which supports Hypertext Markup Language (HTML) documents, and in the California 
Heritage Digital Image Access project to develop navigation tools to move from online catalog 
entries to SGML-encoded finding aids and ultimately to a database of digital images documenting 
California history. 

Beginning in the mid-1980s, efforts to use imaging technology to create digital surrogates began, 
first at the National Library of Medicine, then the National Archives and Records Administration. 
Although these pioneering efforts provided significant information on the use of digital imaging, 
they did not result in sustained efforts for several reasons, primarily because they were difficult to 
justify economically. By the beginning of this decade, however, several developments converged to 
promote the use of digital imaging, including the following: 

♦ dramatic improvements in personal computer technology, including rapidly declining costs cou- 
pled v/ith greatly increased power and storage capacity; 




♦ consequently, exponential growth in the use of personal computers; 
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♦ spread of high-speed, high-bandwidrh nei- 
works accessible to millions worldwide; 

♦ emergence of client/server architecture and 
network-organizing architectures such as the 
World Wide Web; and 

♦ high-quality, high-production scanning 
systems. 

At approximately the same time . a major 
national initiative to preserve the intellectual 
content of brittle books through microfilming, 
spearheaded by the Commission on 
Preservation and Access and the National 
Endowment for the Humanities, opened the 
door for the acceptance of surrogates or replace- 
ments for original sources on a grand scale that 
in turn stimulated the use of digital imaging 
technology in library applications. 

By the mid-1990s, digital imaging was making 
inroads into the domain hitherto reserved for 
textual conversion projects. The technological 
infrastructure had matured enough to support 
the creation, storage, transmission, and display 
of digital images. Although digital image files 
are much larger than equivalent text files, they 
became cheaper to produce (approximately 
$.25/imagc), whereas a fully corrected encoded- 
text equivalent could cost ten times that 
amount. Further, many of the documents con- 
sulted by researchers, particularly in the arts and 
humanities, are graphic (photographs, illustra- 
tions, prints, drawings, maps) and currently 
cannot easily be rendered as encoded files. The 
process of converting text-based material to 
alphanumeric files through optical character 
recognition (OCR) piograms begins with the 
creation of digital images and rlie two steps — 
imaging and text programming— could be 
uncoupled and conducted at separate times. 
Proponents of imaging argue that the latter step 
could await user needs and capabilities for 
sophisticated processing of text or the matura- 
tion of OCR programs to render more accurate 
representations of information, particularly for 
sources in non-Roman languages, handwritten 
documents, or those that are unevenly printed 
or produced with older type fonts. Today, imag- 
ing is the most cost-effective means for retro- 
spectively converting arts and humanities source 
materials to digital form, and represents in 
effect the lowest common denominator for net- 
work distribution. 



Nonetheless, user expectations at the terminal 
are that the full text of important sources for 
their discipline should be available online, 
quickly accessible, and fully manipulatable. 
Researchers who accept and use printed books 
and journals — or even microfilm — often ques- 
tion the value of a digital image surrogate: 

“What good is this image if I can t search it 
with keywords?'’ This question must be satisfac- 
torily addressed in the next few years if digital 
imaging technology is to be used effectively in a 
massive conversion, of text-based sources and in 
the development of distributed digital libraries. 
Currently the most promising use of digital 
image technology may lie in the rendering of 
graphic and photographic materials. 

CURRENT RESEARCH AND TRENDS 
Two major trends have characterized the most 
significant arts and humanities projects involv- 
ing the use of digital image technology over the 
past five to seven years: the move toward the 
creation of sizeable databases and their initial, 
non-nerworked use; and investigations into 
issues associated with image capture. Among 
the former, the most noteworthy example is the 
digitizing of eleven million pages from the 
Archivo General de Indias in Seville, Spain, that 
document the Spanish colonization of the 
Americas. Begun in 1988 as part of the com- 
memoration of the 500th anniversary of 
Christopher Columbus' discovery of America, 
this project has completely revolutionized 
archival practice in the Archivo and researcher 
use of primary documents. While the scanning 
(100 dpi, with 16 levels of gray retained) does 
not capture all the information contained in the 
source documents, the objective to provide on- 
screen use has been successfully met. Almost all 
use of converted materials in research occurs via 
computer. This project’s most significant 
accomplishment has been the creation of 
machine- readable finding aids and catalogs pro- 
viding access to digitally rendered documents 
down to the item level. Initial plans are being 
developed to extend network access to the 
archives to other Spanish repositories. It is 
uncertain at this time whether or when interna- 
tional access over the Internet will be made 
available. Consideration is being given to dis- 
tributing the most significant portion of the 
collection via CD-ROM. 

Other major conversion projects include those 
conducted at the Library of Congress 
(American Memory), the National Agriculture 
Library (which has embraced a goal of replacing 
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the traditional collection with a digital one), the 
Naval Research Lab (which is converting its 
large collection of unclassified documents), the 
National Library of ^L•dicine (where access to 

60.000 images of photographs, art work, and 
printed texts is provided), and Cornell and Yale 
universities. Within the past year, multi-histitu- 
tional digital library initiatives in the arts and 
humanities have been launched or announced, 
including those at the Library of Congress (to 
digitize five million pages of American history 
sources by the year 2000); the Making of 
America Project (Cornell, Michigan, and other 
research institutions) to convert and make net- 
work-accessible 10,000 volumes (and ultimately 

100.000 volumes) on American history; the 
UNESCO-sponsored Memory of the World 
Project; and the recently announced initiative to 
create a national digital library on American 
heritage and culture by a federation of thirteen 
research institutions. The federation will formu- 
late selection guidelines, adopt common stan- 
dards and best practices for conversion, 
guarantee universal accessibility across the 
Internet, facilitate archivability and enduring 
access, and evaluate use and the effects on 
libraries and other institutions. 

Although a number of digital imaging projects 
are beginning to evaluate the use of digitized 
material (including those sponsored by the 
NSF/ARPA, the Mellon Foundation, and the 
Getty Art History Information Program/MUSE 
Educational Media), more rhetoric than sub- 
stantive information has emerged on the impact 
on scholarly research of creating digital collec- 
tions and making them accessible over net- 
works. Preliminary information should be 
forthcoming in the next two years, but compre- 
hensive data may well await the creation of crit- 
ical masses of digitized collections that can 
sustain basic research and the means not only 
for navigating collections but also using them 
effectively in an online environment. 

The second major research trend is defining 
image capture guidelines and quality assessment 
processes in the absence of any official standards 
governing image quality in digital conversion to 
digitally rendered documents. Under the direc- 
tion of Michael Ester, the Getty Art History 
Informarion Program pioneered work in exam- 
ining the relationship between image quality 
and viewer perception, principally with graphic 
materials. Cornell and, more recently, the 
Library of Congress in conjunction with Picture 
Elements have established quality benchmarks 



for the conversion of textual sources that are 
based on the attributes of the source documents 
themselves and the effects on image quality of 
resolution, gray scale, and compression. The 
two institutions have agreed to collaborate on a 
joint investigation to extend this work to a 
broad range of source materials. The Research 
Libraries Group, in cooperation with the Image 
Permanence Institute, explored technical issues 
associated with the digital conversion of photo- 
graphic materials; the latter will build on this 
effort through a two-year project to conduct 
both objective and subjective image quality 
evaluations, develop quality benchmarks, and 
suggest technical standards for photographic 
conversion. In two complementary projects, 
Cornell and Yale universities will examine the 
costs, processes, and quality implications for 
creating both digital images and microfilm. 
Columbia University recently completed a 
small-scale project on the quality implications 
of converting oversize color maps. 

The principal investigators of these and other 
projects have argued for digitizing in a manner 
to ensure full capture of significant information 
present in the source documents. Some advo- 
cate the creation of an “archival” digital master 
for preservation purposes to replace rapidly 
deteriorating original source documents. Others 
consider the cost benefit of selecting, preparing, 
and digitizing material once at a high enough 
level of quality to avoid the expense of recon- 
verting at a later date when technological 
advances require or can effectively utilize a rich- 
er digital file. Others suggest that derivatives 
can be created from the master to meet current 
user needs, and that the quality of these deriva- 
tives is directly affected by the quality of the 
initial capture. Various digital outputs have dif- 
ferent quality requirements; high resolution 
may be required for printed facsimiles but not 
for on-screen display and use. 

NEAR TERM 

It is anticipated that within two years, quality 
benchmarks for image capture for the range of 
paper- and film-based research materials — 
including text, line art, halftone, and continu- 
ous tone images — will be well defined for a 
variety of outputs (paper, film, and then on- 
screen display). For the most part, these will be 
designed to be system independent, will involve 
the creation of sophisticated technical test tar- 
gets, and will be based increasingly on the 
attributes and functionality characteristic of the 
source documents themselves. These efforts 
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began with determining what was technologi- 
cally possible; current and near-term efforts are 
directed at determining what is minimally 
required to satisfy informational capture needs. 
At present, the trend is to set image quality 
requirements at a level sufficient to capture sig- 
nificant informational content for a broad range 
of source documents at the expense of file size 
so as to avoid the labor and expense of perform- 
ing item-by-item review. 

Although technical, these benchmarks will also 
take into consideration the subjective evaluation 
of curators and the needs of researchers. The 
Image Permanence Institute plans to incorpo- 
rate psychometric scaling tests in its evaluation 
of digitally converted photographs and photo- 
graphic intermediates. Quality assessments will 
extend beyond capture requirements to the pre- 
sentation and timeliness of delivery options. 

From an industry perspective, research into 
image capture has slowed; the current emphasis 
is in bringing to market scanning systems that 
will offer a range of moderate- to high-quality 
capture options, but more importantly, faster 
throughput, greater flexibility in accommodat- 
ing a variety of source documents, and better 
calibration across scanners and peripherals (e.g., 
printers and display devices). The industry will 
move to high-production gray-scale/color scan- 
ning systems that can meet the performance 
records of bitonal (black-and-white) scanners. 

The most promising scanning devices to appear 
in the next several years will be planetary and 
digital cameras, such as those now coming on 
the market from Minolta and Zeutschel, that 
can handle bound volumes, three-dimensional 
objects, fragile material, and oversize documents 
in a non-damaging fashion and without resort- 
ing to the creation of photo-intermediates. 
Unlike flatbed scanners, digital cameras will 
enable operators to exercise greater control over 
resolution, lighting, and color balance. It may 
be several years before digital cameras compete 
effectively with photography, however. 

Increased quality and performance can also be 
anticipated from film scanners, such as the 
Sunrise scanner that allows for high resolution 
and gray-scale capture. 

lechnically sophisticated software for image 
quality assessment and calibration, such as 
ImageXpert''’*, which incorporates fourteen dif- 
ferent tests (e.g., modulation transfer function 
(MTF), signal-to-noise ratio, gray resolution, 



dimensional accuracy, color registration and 
consistency) will provide operator-independent 
objective tests of system performance. Until 
recently, such tests were beyond the capabilities 
of all outside industry or research labs. Color 
management systems are also now available to 
calibrate color data across imaging systems and 
individual components (scanners, monitors, 
printers). The Munsell Lab at the Rochester 
Institute of Technology has conducted extensive 
research on managing color data through the 
whole digitization process. Several projects 
focusing on art reproductions, VASARI and 
Methodolog)" for Art Reproduction in Color 
(MARC), are exploring alternatives for achiev- 
ing true color fidelity. 

The next generation of software programs to 
govern image quality should incorporate smart 
systems for automatic, on-the-fly applications of 
appropriate capture processes (resolution, gray 
scale, filters, etc.) based on an assessment of 
document attributes and explicit institu- 
tional/user profile requirements. Early proto- 
types for this may be seen in the Xerox 
XDOD “autosegmentation feature” that 
detects the presence of halftones, applies 
descreening and halftone filters to those por- 
tions while treating text and line art w'ith sepa- 
rate image enhancement algorithms designed 
to optimize contrast and detail. Instead of cre- 
ating separate windows for halftone and textu- 
al content, a future approach may be vo create 
layered images, with bitonal capture preserved 
in one layer, tonal reproduction in another, 
color saturation in a third. 

In the longer term, programs will contain fea- 
tures for automatic image quality verification, 
designed to check not system performance, but 
the digital files themselves. These will automati- 
cally match quality guidelines to desired out- 
puts: paper, film, and (in the case of on-screen 
display) the monitor’s capability. 

User requirements for derivative “access” 
images, including speed of display, browsing 
versus detailed examination, and color/tonal 
fidelity, will also become programmable. An 
early example of such considerations is seen in 
“progressive transmission,” in which a complete 
but low- resolution image is transferred quickly; 
detail is added gradually until full image cap- 
ture is displayed or the reader hairs the trans- 
mission. Kodak and Live Pictures, !nc. recently 
signed an agreement to develop capabilities for 
transmitting, viewing, and manipulating 
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high-quality images with less computing and 
networking capabilities than are currently 
required. The Live Picture technology stores 
images as a sequence of discrete subimages, 
making it possible to access only those portions 
needed for transmission or editing at relatively 
high speeds, even over regular telephone lines. 

Some of the more promising industry research 
focuses on conversion with functionality, bring- 
ing intelligence to digital files. Most attention 
to date has been given to improving the accura- 
cy and performance of OCR technolog)' that 
can accommodate a broadening range of lan- 
guages and text-based representations. Adobe 
Systems’ new Acrobat Capture software incor- 
porates OCR technology with bitmap imaging 
to create text-searchable files while retaining 
typefaces, graphics, and the original page lay- 
out. The combined text and image information 
present on an illustrated page, for example, are 
compressed with the most appropriate compres- 
sion process and combined into a Portable 
Document Format (PDF) which is smaller than 
a compressed digital image. The accurac)^ rate 
of conversion can be set so that pages or por- 
tions of a page that challenge the software's 
capabilities can be retained as bitmapped 
images. According to a recent press release, the 
military is considering using Acrobat Capture 
to convert twenty million pages of text. 

Perhaps more significantly for future navigation 
of large image databases of mixed content, 
attention is being paid to pattern matching and 
object recognition for non-textual information 
present in digital images: symbols, spatial 
dimensions, orientation, and facial features, for 
example. Excaliber is extending its OCR pro- 
gramming ro accommodate face recognition, 
and Photodex is experimenting with database 
searching via an iconic interface. Providing 
computer-processible, eye-readable digital 
images for graphic materials represents the next 
logical step along this continuum. Initial work 
has begun to convert raster images to vector 
images, a popular process used in Geographical 
Information Systems (CIS) and in engineering 
and architectural applications. In vectorization, 
an image is generated by a set of mathematical 
equations that describe points and locations 
within the image. They can be computationally 
altered to provide image functionality and 
manipulation. The long-range potential is to 
replace raster images (captured dot by dot) with 
vector images, which will result in greater func- 
tionality for searching, sorting, and manipula- 



tion, and greatly reduce storage requirements. 
Issues associated with quality, however, must be 
carefully evaluated in this conversion process. 

Research, too, is focusing on more efficient 
compression processes that preserve fidelity and 
minimize the introduction of artifacts or noise. 
Work in the development of fractal and wavelet 
compression techniques is still underway. In an 
application for Citibank, Kodak is applying 
highly efficient, syntactical image compression 
to store photo-identification as barcodes on the 
back of credit cards. It is envisioned that these 
will be read at retail stores, where the physical 
identification of the customer can be verified. 
This system of compression, based on building 
a taxonomy of like attributes (e.g., a library of 
facial features) may ultimately have broad appli- 
cations for a wide range of source materials. 

FUTURE RESEARCH NEEDS 
Research needs in the digital conversion of tra- 
ditional materials fall into three categories: eco- 
nomic, technical, and evaluative. Generally 
stated, the technology must become cheaper, 
better, and faster. Economically viable scanning 
processes and services are critically needed. 
Higher scanner throiu^hput must be coupled 
with high-quality im..ge capture capabilities and 
automated means to ensure consistency of per- 
formance and quality control. Research institu- 
tions must work with vendors to jointly develop 
cost-effective imaging service capabilities of 
high quality and standardized means for creat- 
ing/capturing the requisite meta-data for order- 
ing and navigating the digital images 
themselves. The means for capture and indexing 
should be non-proprietary in nature and should 
lend themselves to network distribution and 
future digital applications, such as OCR, struc- 
tural linking, and visualization techniques. 
Definitions for creating an audit trail on con- 
version decisions must be incorporated into 
header information for each image. 

Processes for selection, conversion, intellectual 
control, and retrieval must be automated or 
semi-automated if digital imaging is to become 
an attractive economic alternative. In the near 
term attention should be directed to matching 
circulation records with selection decisions, 
deriving intellectual control from the digital 
files themselves, evaluating the utility of fast 
browsing over textual description, and creating 
a juried, interactive meta-database that could 
accommodate user input and differentiated lev- 
els of access. 
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Business cases demonstrating the economic 
advantages of digital imaging applications to 
research libraries and cultural institutions must 
be developed and verified. The case for shared 
responsibility and enhanced access to distrib- 
uted sources over the institutional ownership of 
hard-copy sources must be made firmly and 
convincingly. The Andrew W. Mellon 
Foundation is funding a number of digital ini- 
tiatives designed to provide economic compar- 
isons between traditional library costs and those 
associated with digital library development. 

User needs, perceptions, behaviors, and adapta- 
tion to online sources must be studied in detail. 
Preliminary studies suggest that researcher 
acceptance of image databases will depend on 
their convenience, speed of access, and degree 
of user control. It should be presumed that the 
development of sizeable image databases rich 
enough to support in-depth research are neces- 
sary — but not sufficient — to facilitate scholarly 
acceptance of the change from hard copy to 
online resources. Means for navigating, retriev- 
ing, annotating, synthesizing, and presenting 
information at the desktop must also be 
devised. These capabilities must be developed in 
an iterative, user-centered fashion because 
researchers’ needs will change with time and 
their increasing level of sophistication with 
using digital technology. Greater human con- 
trol, requiring less human intervention, will be 
necessary. 

Although navigation, retrieval, and utility issues 
will be central to this research, dramatic 
improvements in electronic display must be 
achieved. Research and development of moni- 
tors and other projection devices that make it 
possible to display documents in their original 
size with full legibility is essential. Ergonomic 
issues associated with scholarly research habits 
(e.g., eyestrain, body positioning) deserve 
greater exploration. Control and flexibility in 
terms of display, access time, and functionality 
must rest with the end user. In addition to 
improved display, research will be needed to tie 
image presentation more closely to visual per- 
ception rather than technologically consistent 
approaches. 
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INTRODUCTION 



Designing an effective multimedia retrieval system is a complex challenge, primarily because existing 



guidelines for text-based systems do not entirely apply to the new technolog)'. Fresh analytical chal- 
lenges confront the multimedia cataloger, for instance, who to optimize retrieval must conceptually 
and perceptually deconstruct materials across several cognitive dimensions. But existing cataloging 



tools have yet to catch up with the fact that multimedia description tasks need greater expressive 
power. This paper discusses these issues as they relate to arts and humanities collections. Sometimes 
image databases will be used to illustrate a topic, but the central issues are shared broadly by all mul 
timedia applications. 

DIGITAL LIBRARIES 



inventory management concerns, library systems eventually grew more sophisticated in work flow 
integration, control functions, and enhanced public access. 

Today, most image databases are like library automation systems of the early 1980s (i.e., proprietary, 
and weak on retrieval for all but the most adept). Through the 1980s library systems eventually grew 
into Integrated Online Library Systems (lOLS), with isolated components united into more fluid 
structures of communication. Further, productive research into retrieval technologies brought gener- 
al-purpose access methods to a diverse set of system users beyond the caretakers of a collection. 

Image databases have not yet smoothly invegrared work flows, nor has research resulted in an inte- 
grated, widely usable institutional system. 

Many years work, however, preceded iOLS development, especially in classification, cataloging, 
and public access methods. If one looks back far enough, the bibliographic record as we know it 
today can even be traced to the cataloging record attributed to Kali i machos in his tenure at the 
Library of Alexandria [1]. For the items that an image database will need to classify, catalog, and 
retrieve, there is no corresponding historical tradition that can be drawn from w'hich is a limiting 
factor in the development of multimedia applications. 

Essentially, this long tradition of organizing ideas, however im perfect, provided the library automa- 
tion community wdth a necessary framework on which system developers could build strucu.ires. The 
same generalized methods have not yet materialized for cataloging images and multimedia objects. 
Many accounts in both the scholarly literature and the trade press describe an organizations rush to 
acquire multimedia database software, only to face the most pressing problem of all: how to describe 
the materials in question to achieve effective retrieval Irom the system just purchased. As these data- 
bases scale upward in size, collection managers soon begin to realize that applying existing descriptive 
methods may be more likely to bury their assets than to provide the wide retrieval they hoped for. 

BACKGROUND CONCERNS AND ISSUES 

Retrieval technologies arc fundamentally judged by how their search tools perform. For database 
users, the most memorable part of their interaction with the .system is with the algorithms that 
answer their questions. In reality, the key to success is heavily dependent on the quality of the data 
preparation environment that supports database design, documentation, and cataloging activities. If 
one looks carefully at why various multimedia projects fail to yield the expected benefits, one often 
finds that the data preparation step was not adequately formulated. Cataloging it.self rests upon yet 
another layer, data representation, which refers to the choice of abstniction needed to manipulate 



The discussion of image databases in the literature over the last several years bears a striking similari- 
ty to the literature describing the development of library automation systems. Beginning with basic 
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data on a computer platform. A brief outline of . 
these three issues follows in order to establish a 
context for later discussion. 

DATA REPRESENTATIONS 
Text-based descriptions, the most common 
form of data representation used by database 
technologies today, have proved to be very ade- 
quate representations for text-based materials. 
What could be better than using words to 
describe other words and applying linguistic 
methods computationally to linguistic struc- 
tures? But how adequate are text- based methods 
for non-tcxtual materials? We have been pro- 
ceeding into the multimedia age assuming that 
people **read” and understand images in the 
same way that they ‘‘read” and understand doc- 
uments. Multimedias appeal to several senses 
and perceptual modes actually challenges the 
use of words to describe non-textual modalities. 
Early adopters of multimedia have been con- 
firming this obvious fact as they commonly 
report an inability to perform comprehensive 
searches on their newly implemented multime- 
dia systems. 

Part of the problem is that existing methods do 
not go far enough to describe the aspects that 
differentiate a particular medium from another. 
For example, in photographic images, the place- 
ment of the camera relative to the central area 
of interest contributes important visual differen- 
tiation for “visual” retrieval purposes. Yet even 
within systems that incorporate camera angle 
and distance of the subject from the camera, 
many irregularities are found across this kind of 
description. Both the lack of standard naming 
conventions and uneven visual training among 
catalogers contribute to the problem. 

Several initiatives in the research community are 
today experimenting with non-textual represen- 
tations for multimedia content, deriving a new 
alphabet that multimedia systems will use to 
represent the content of a digital file. (For 
example, a simple non-textual representation 
for color is a hue/saturation histogram for red- 
green- blue expressed as a string of integers.) 
From an arts and humanities perspective, a fun- 
damental question remains unanswered. Are the 
non-tcxtual, or "content-based,” technologies 
arriving at representations that have enough 
expressive power for the materials that arts and 
humanities collections hold? Since content- 
based work promises a form of automatic 
indexing and new avenues for search interfeces, 
how will traditional cataloging and search 



methods be affected? Most significantly, what 
happens if several competing non-textual meth- 
ods arrive in the marketplace? How will our 
carefully crafted interchange standards support 
the inevitable variety of content-based represen- 
tations that will emerge? 

CATALOGING METHODS 
Cataloging is essentially a process of creating 
intelligent contextual judgments, with the goal 
of assembling descriptive access points that can 
not only group items by their similarities, but 
also distinguish differences within a collection. 
Cataloging professionals predominantly use 
text-based structures as decision support tools 
to construct descriptions for a database. A well- 
defined protocol and known economy are in 
place to support this process today. Preserving 
this investment is an important consideration 
when evaluating new technologies. 

Multimedia content poses brand-new challenges 
to this effort, given the additional perceptual 
modalities introduced, which are not evenly 
represented in the text-based tool set. Im^e 
archive managers know all too well what it is to 
find an image, then hear the further inquiry: 
“Do you have any others like this?” While the- 
matic content may be readily accessible using 
cataloged access points, retrieval by purely visu- 
al attributes is completely dependent on the 
personal “memoria technica” formed by the 
archivists experience with his or her collection. 

The two most pressing issues for cataloging 
practices today are: 

♦ Can existing text-based structures be supple- 
mented to support multimedia cataloging, 
based on a sound understanding of human 
cognitive processing of each unique medium? 

♦ Can content-based technologies evolve to 
work cooperatively with text-based methods? 

SEARCH MODELS 

Database designers create search models to for- 
mally describe the primary retrieval tasks a data- 
base must support. For example, the user of an 
inventory database would expect retrieval by 
part number to be a natural search criterion. 
Similaily, the user of a music database may 
expect retrieval by musical phrase to be a crite- 
rion for success. Consistent and psychologically 
informed search models for multimedia retrieval 
are neither readily available nor obvious. The 
search models found in both early products and 
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the research literature appear to be driven by 
what technology is able to do, rather than how 
people make perceptual sense of different 
modalities. Traditionally, database technology 
has assumed that one stores “answers” com- 
pletely and entirely in the database. But with 
multimedia retrieval, a greater portion of the 
“answer” to a search is located in the recogni- 
tion power of the person initiating the question. 
The adage “I will know it when I see it” 
expresses this phenomenon succinctly. 

IMAGE DATABASES AND 
TEXT-BASED CATALOGING 
Most image databases today rely on text-based 
descriptions for cataloging and search purposes. 
Whether the choice of a word is derived from 
free-form thought, or from a structured vocabu- 
lary such as the An & Architecture Thesaurus 
(AAT), the “representation” is searched as a unit 
of text. The fundamental paradigm employed 
by most systems is matching the impressions 
and words of the person cataloging an image 
with the words and affective intention of the 
person searching for an image. 

For arts and humanities collections, several 
intelligently composed cataloging tools have 
been developed to enhance consistent descrip- 
tion and access. ICONCLASS, the AAT, and 
the Library of Congress Thesaums for Graphic 
Materials (LCTGM) are a few of the formal 
tools currently available. However, are they ade- 
quate for building solid descriptive cataloging 
for image databases? In a forwarded PhotoCD 
discussion note [2], the California Historical 
Society noted that combining several formal 
vocabulary tools to describe their images has 
much improved access. While the time and cost 
to complete a data record is increased signifi- 
cantly by this approach, text-based cataloging 
can be improved by a more formal coordination 
among existing tools. 

As daunting as the problems of the text-based 
approach is the different thinking modality 
associated with visual materials. No longer are 
the variable combinations of image elements, 
thematic content, and iconographic denotations 
the only issues of concern. Other more finely 
shaded interpretations are also required that arc 
difficult to name, such as the formal composi- 
tional rendering techniques the artist uses. 

For the most part, text-based descriptions in 
current image databases try to stay close to the 
realm of the tangible and the nameable. While 



this method may work well for very small col- 
lections of images, significant problems occur as 
the image database begins to scale upward in 
size. It becomes much more difficult to find 
“the difference that makes the difference” to 
ensure successful searching. 

A contradictory task faces both users and collec- 
tion managers. How can the power of visual 
representation be unlocked using descriptive 
instruments that are not completely suited to 
visual differentiation? A single word may name 
an object, such as a clock, but only the limitless 
variations of compositional characteristics and 
genre denotations provide the differentiating 
factor. The old cliche can truly be reversed: A 
word (in an image database, at least) can be 
worth much more than a thousand images! 

IMAGE DATABASES AND 
CONTENT-BASED CATALOGING 
Over the last several years a number of 
researchers in computer science and electrical 
engineering schools have been working on the 
solution to the text-based dilemma, focusing on 
creating descriptions from a digital image file 
itself, a technique commonly called content- 
based description. The content-based work 
most notable for arts and humanities empha- 
sizes the recognition and description of color, 
texture, shape, spatial location, regions of inter- 
est, facial characteristics, and (specifically for 
motion materials) key frames and scene-change 
detection. 

One goal of content-based work is to provide 
algorithms that can automatically recognize the 
important features contained in an image with- 
out the need for human intervention in the 
process. Since cataloging is the most expensive 
step in multimedia database implementation, 
the promise of content-based methods has a 
strong appeal for reducing costs (and, one 
would hope, increasing indexing consistency). 

The current state of content-based technology, 
while very impressive, has yet to provide the 
generalized methods needed for wide accep- 
tance in the arts and humanities community. 
Notable work has been produced by the MIT 
Media Lab in the content-based work specifical- 
ly related to face, shape, and texture recogni- 
tion, collected under the application called 
PhetoBook [3J. Existing commercial applica- 
tions, such as IBM’s Query By Image Content 
(QBIC), provide consistent representations. See 
[4] for a recent article in the popular press. I’he 

i 5U 



RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE 



QBIC technology operates on color, texture, 
shape, and feature locality. 

The content-based representations produced by 
these projects all have the unique stamp of the 
research that produced them. If one were able to 
look at the algorithms that produced the con- 
tent-based descriptions, they may share some 
common thought, but most likely there will be 
significant differences based on local innova- 
tions. While the desire may be for content-based 
work to settle into a consensus form to enable 
broad usability, the truth is that this is a highly 
creative and fluid time period for the content- 
based research community^ A stable set of meth- 
ods on which to build standards are not likely to 
emerge in the short term. An arts organization 
may choose a single and unique content-based 
scheme for its local collection database. But it 
will be difficult if not impossible to share that 
same representation with other organizations 
using different content-based schemes. 

This fact should not deter the arts and humani- 
ties community from applying the power of 
content-based technology; on the contrary, this 
is an ideal time for application needs to be more 
clearly understood and communicated to the 
content-based community in order to ensure 
that the proper forms of representation are being 
considered and tested. Content-based technolo- 
gy holds great promise for multimedia retrieval 
and over time will create representations that 
provide unique dimensions for retrieval. 

It is important to note that content-based tech- 
nologies strive to create mathematical represen- 
tations of phenomena derived by a set of rules, 
although a complete rule set for human visual 
interpretation has not yet been formulated. (A 
highly readable discussion of this issue is inter- 
woven in a recent NSF/UC Imne report [5] for 
digital video systems.) For example, one may 
observe a content-based database search for 
images on the dimension of texture, but among 
the results on the screen are usually some images 
that make no visual sense at all. To the content- 
based system it looked right, but to the human 
visual system there is a mismatch. (Imagine the 
challenges that connoisscurship studies would 
provide to content-based research!) 

The reality for this technology is that complete- 
ly automatic content-based recognition is on a 
very distant horizon. It is much more likely that 
the cooperative efforts between text- based and 
content-based methods will yield the most 



interesting and useful results for representing 
image and motion content for a very long time 
to come. 

BUILDING USER-BASED SEARCH 
MODELS FOR RETRIEVAL 
An area that has received very modest attention 
in the rush to develop image databases is image 
database user studies. Other papers in this col- 
lection will discuss this issue more thoroughly, 
but I will touch on two issues specifically relat- 
ed to the cataloging and retrieval process. The 
first issue is related to understanding the kinds 
of questions that users pose to existing systems 
to satisfy an existing work process in which they 
are engaged. The second issue is related to the 
visual review process that assists users through- 
out the selection process, since a search is not 
really over until something has been selected. 

SEARCH QUESTIONS 
Before images can be cataloged, whether by 
text- or content-based methods, it is necessary 
to establish some guidelines for what is impor- 
tant to describe. At the heart of all good data- 
base systems is an understanding of the needs of 
the people who will use the database. 

As an example, the Computer Interchange of 
Museum Information (CIMI) initiative. Project 
CHIO (Cultural Heritage Information Online), 
found that this line of inquiry was fundamental 
to establishing an information sharing model. 
An IMAGELIB posting entitled “Looking for 
Mr. Rococo” [6] provided a rich source of dis- 
cussion about understanding the pattern of 
museum patrons’ search questions in their own 
words (not filtered through an intermediary). 
Their inquiry revealed several “points of view” 
that required more access points than current 
cataloging practices originally envisioned. 

Inquiry “to understand the ecology of ques- 
tions” is a valuable way to begin laying the 
foundation for constructing multi-purpose data 
records that support different kinds of system 
users. The broadest possible view is to create a 
cataloging data record whose contents may be 
rearranged to suit the requirements of multiple 
“points of view.” 

A working example of this issue is the research 
performed for the Kodak Picture Exchange 
application for commercial photography [7]. 
Image search questions (in the words of the 
originator) were collected from both image 
owners (photo agencies) and image users 
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(graphic artists, art directors). Five common 
search patterns that emerged from this inquiry 
were invaluable in establishing a “layered** 
framework for describing commercial photo- 
graphic images. In addition, these search pat- 
terns made it possible to construct data records 
that provide access to two different “points of 
view”: that of ediro» iai and advertising users. 



alent; in others, there are implications for the 
cataloging record itself As an example, two of 
the visual strategies found are discussed, togeth- 
er with their cataloging implications: 

1. Visual thinking is stimulated by images. 

People often start to look for images by using 
images. They may perform a random or direct- 
ed search through books, catalogs, files, etc. 



While existing search qr.estions cannot possibly 
model all the search variations a system may 
receive, this line of arialysis provides the data- 
base designer with ;?.n excellent starting point. 
Some examples of the search and review pat- 
terns that were observed are: 



Pattern 
Image Elements 

Compositional 

Qualities 

Subjective Responses 
Spatial Relationships 



To Search For 

Contexts^ objects, 
actions, places 

Artistic techniques, 
genre, medium 

Mood, emotion, 
subjective evaluations 

Proximity, placement 
of objects to one 
another 



Implication: Image databases need to provide a 
structure, like a visual table of contents, that 
users can access without specifying words. User 
interaction becomes much easier if a purely visu- 
al activity is provided as an initial welcome to a 
system or during the inevitable “dry spell” fre- 
quently experienced during a search session. Not 
all imfiges in a database would necessarily be can- 
dida^^ for this browse function. Visually appro- 
pria;:e cataloging methods are needed to tag an 
im/ige as just such a browsing “candidate.” 



/ 



i 

/ 



7 . Images already selected provide the basis to con- 
j tinue a search. 

/ Once suitable images have been found that are 
close to the desired visual match, people will 
often use selected images to submit a request such 
as “Get me more like the ones I just found.” 



Intellectual Property Usage restrictions andj 
pricing j 

The importance of a “points of view” inqi/iry 
cannot be stressed enough. The understanding 
gained in this work makes it possible to" make 
conscious choices about levels of cataloging 
based on user populations. Further, f)ne can cre- 
ate an economic model to support cataloging 
activities and evaluate cataloging lools against a 
performance framevvork. 



Implication: An understanding of image similar- 
ity features is needed if the “get more like this” 
scenario is to have good results. Arriving at a 
robust set of visual similarities for arts and 
humanities applications is a major challenge, but 
in the long term v/ill contribute richly to the 
search environment. To incorporate visual simi- 
larity into a cataloging data record will require a 
deep understanding of each medium and the 
cognitive process used for interpietarion. 



A VISUAL THINKING MODEL 
Studies in art history and visual/mass commu- 
nications concern the interpretation of visual 
materials and their analytical deconstruction, 
but few have specifically tracked the thought 
process that supports the image search itself 
Searching for images may require different 
thought processes than searching for text-based 
materials such as documents or books; if so, 
then multimedia cataloging will have to reflect 
this fact. 

One study by Romcr [7] enumerates several 
visual thinking processes observed with profes- 
sional photo editors. In some cases the search 
and review process needs only a software equiv- 



RECOMMENDATIONS FOR 
RESEARCH OPPORTUNIT'xES 

Points cf View Studies 

Across arts and humanities collections a wide 
variety of potential users need to be studied. 
Among the user types chosen, a quantitative 
methodology should be established for deriving 
“points of view” frameworks to guide the cata- 
loging process. 

As mentioned earlier, the two most important 
aspects to encapsulate in these studies are the 
discovery of patterns in user search questions 
and the perceptual review methods that arc 
employed while refining a search. Both studies 
will provide the evidence needed to design 



54 



RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE 



o 

ERIC 



practical multimedia databases, as well as drive 
software-related development for user interfaces. 

There are few studies in either of these areas, 
but most notable is the work of RG.B. Enser 
for the Hulton Deutsch picture collection [8]. 
The CIMI discussion around Project CHIO 
appears to be the most current, active forum in 
which several “points of view” studies are 
already under way. This project also presents an 
opportunity to assess valuable tools such as the 
Categories for the Description of Works ofArtWxih. 
more user-centered understanding derived from 
“points of view” studies. 

Text-based Resources Reviewed for Structure 
Existing text-based resources that support cata- 
loging practices need to be reviewed in terms of 
how well they satisfy the requirements of multi- 
media retrieval. Preliminary work is needed to 
develop a list of multimedia retrieval require- 
ments; based on this work, possible projects 
might be: 

♦ An evaluation of existing resources such as 
the AAT, LCTGM, etc. to determine how 
well they perform against a multimedia 
search model derived from “points of view” 
studies. Support for this approach is partial- 
ly found in the work of Soergel [9] related 
to user studies validating the contents of 
formal cataloging and access tools. 

♦ An evaluation to support restructuring hier- 
archical resources into semantic networks, 
i.e., structures that represent knowledge in 
an interconnected manner. Note that the 
use of a network structure eliminates many 
of the limitations surrounding hierarchical 

• and faceted thesauri. With a semantic net- 
work it is possible to assign several relation- 
ships between terms with differing weights 
to provide a clear notion of the semantic 
strength between terms. 

A particularly lucid theoretical discussion by 
Janice Woo [10] contrasts the issues of tradi- 
tional static organizations of concepts to 
dynamic relationships based on participatory 
actions (i.e., hierarchical vs. network structures). 
Chakravarthy [11] presents an excellent and 
thorough discussion of a prototype image 
retrieval system supported by semantic network 
technology (WordNet). 

An area of descriptive depth that is important 
to image retrieval (especially images with his- 



toric value) is the precise definition of image 
elements and their proximal relationship to one 
another. (Image elements are the tangible peo- 
ple, objects, actions, places, etc. depicted in an 
image.) Current cataloging practices do not 
focus on the mundane level of naming individ- 
ual objects or actions depicted in an image, 
focusing instead on the descriptions of thematic 
content and iconographical attributions. For the 
broadest possible access, though, there is a need 
to name individual image elements and their 
relationships to one another in a standard syn- 
tax to support precise searching capability (e.g., 
a man sitting in a carriage in front of Niagara 
Falls). A consensus on syntax across arts and 
humanities cataloging will also drive system 
vendors to incorporate this level of specificity 
for search support. 

A number of “picture description” languages 
have been proposed by several disciplines. Hibler 
[ 1 2] has suggested a very practical method. 

Media Differentiators 

Each distinct multimedia type embodies percep- 
tual qualities that make it a unique vehicle for 
communication. It is important to investigate, 
and find a cataloging equivalent, for those 
unique qualities. For example, a photographic 
image is greatly influenced by the choice of 
process used. A daguerreotype is different visual- 
ly from an ambrotype, even though both kinds 
of imagery are often housed in latched cases. For 
modern photography, lens and filter choices cre- 
ate visible differences that contribute to the 
image experience. Being able to recognize and 
catalog the differences helps an image database 
support a visually based “get more like this” sce- 
nario. A host of other differentiators for image, 
music, motion, graphics, etc. need further study 
and articulation. 

An example of an excellent manual that pre- 
sents clear descriptions of visual differentiators 
for identifying historic photographic processes 
is Reilly [13]. 

Visual Thesaurus 

Various researchers have created a number of 
“wish lists” pointing to an idea called variously a 
visual thesaurus, or picture thesaurus, or picture 
dictionary. All thinkers have a similar vision: 
having an image point to its visual “synonym.” 
More complex versions provide a gencra-species 
“divide and conquer” strategy. In all cases, the 
visual thesaurus provides the structure for plac- 
ing visually similar things with their relatives. A 
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visual “sense'' pervades the similarities across a 
number of different qualities: genre, composi- 
tional technique, lime, etc. A recent and excel- 
lent example of work in this area is by Lohse 
[14]. Chang [15] has presented prototypes relat- 
ed to this topic that support visual reasoning 
using a data structure called Visual Net. 

Representative Sets of Images 
One obstacle to advancement in content-based 
technology is the lack of sizeable and realistic 
data sets tied to application requirements for 
development and rest purposes. It would be 
immensely valuable to establish a formal 
method to provide good data sets and share 
research results broadly between the content- 
based and arts and humanities communities. 
(Note that the data sets in use today are typical- 
ly from clip art CDs that contain very simplistic 
depictions for analysis.) While some engineer- 
ing schools work closely with their institutions' 
art history departments, there is no umbrella 
organization that then helps to synthesize and 
interpret implications more broadly. 

Music and Motion Representation 
In both the library and content-based commu- 
nities still images have received the greatest 
attention in the research literature related to 
representation and cataloging issues. While 
music and motion imaging are also topics for 
research, fewer studies exist than for the world 
of still-image applications. Both music and 
motion cataloging require more fundamental 
thought in order to arrive at the right conceptu- 
al framework for subsequent implementation. 

In music, an early dissertation by Page [16] 
focused on issues related to the written musical 
score as the fundamental starting point for 
musical representation. A paper by Wiggins 
[17] provides a framework for describing and 
evaluating music representation systems in a 
broader context. Hawley [18] analyzes the cre- 
ation of “structure out of sound” for multime- 
dia retrieval. 

Davis [19] presents a motion annotation system 
in a prototype environment called 
MediaStreams, which uses icons to describe 
video content. Csinger [20] proposes a knowl- 
edge-based framework to support the human 
effort required for annotating motion. 

CONCLUSION 

In summary, image and multimedia databases 
are heavily dependent on the quality of their 



stored descriptions, which (whether text- or 
content-based) provide the foundation for all 
meaningful interactions with a system. Several 
descriptive challenges remain to be solved in 
order to create effective representations. The 
solutions, as indicated above, appear primarily 
interdisciplinary. The ideal team would natural- 
ly be composed of professionals in information 
science, electrical engineering/computer science, 
visual/mass communications, and cognitive psy- 
chology. Each of these disciplines holds a por- 
tion of the knowledge required to support 
research in this vital and growing area. 

Multimedia “objects” (image, motion, audio, 
graphics, compound document, etc.) acquire 
useful descriptive data throughout many differ- 
ent stages of their existence. Some data is 
acquired automatically by capture devices (such 
as scanners, or digital cameras), some is added 
by human intervention through traditional cat- 
aloging methods, and yet other data is acquired 
by automatic, content-based techniques. All 
these streams of data will require intelligent 
coordination and constant attention. The end 
result is to create a richer set of descriptions for 
retrieval purposes, which can be employed in 
combination to provide more meaningful access 
to the vast heritage of the arts and humanities. 
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With the wide availability and increasing usefulness of electronic media, arts and humanities educa- 
tion is poised for significant change. Some of these changes are already under way, others are just 
beginning to appear on the horizon. They are being met with enthusiasm from some and strong 
resistance from others. The key to the changes now under way is that a new medium makes possible 
new methods of teaching and learning and a new epistemology: new structures for representing 
knowledge. Those who have already been engaged in pushing the boundaries of their disciplines are 
the most likely to be early adopters of the technology. 

RESEARCH AND PRACTICE TO DATE 

Writing and Foreign Language Learning 

The skill-based disciplines of writing and language learning have been the most active early users of 
the technology. It is significant, and perhaps forms a useful paradigm for other humanistic disci- 
plines, that in both these cases the adoption was driven by methodological changes. 

In the teaching of writing the process model, advocated by theorist-practitioners like Donald 
Murray, Peter Elbow, and Linda Flowers, was coming into wide acceptance during the late 1970s 
and early 1980s. The arrival of personal computers starting in the mid-1980s made process teaching 
much easier by making it easier to create and critique multiple drafts and share the product with 
peers as well as teachers. Many aids to writing have been created and are in use across the country, 
most notably The Writers Workbench (Bell Labs/Colorado State), which includes process aids in 
addition to its original set of more controversial style checkers, and Daedalus (University of Texas), 
which allows students to hand in papers online. University-specific networked systems are in use at 
Carnegie-Mellon, where much imaginative early work was done in modeling a process-approach 
electronic writing environment, and at MIT, where the sys:em includes an electronic classroom, a 
facility for adding teachers corrections to papers handed in electronically, and an online textbook for 
technical writing. The use of electronic classrooms in which work can be displayed, critiqued, and 
edited on (arge-screen displays has made it possible to demonstrate the process of writing in the 
classroom with an ease not available under the constraints of paper and blackboards. Currently, there 
is much interest in exploring computers for teaching collaborative writing. 

Commercial systems have superseded much of the work on writing software attempted by university- 
developed systems. Spell-checkers, outliners, annotation icons, and multiple versioning software are 
all available in word processors or document systems. But the ready availability of commercial prod- 
ucts that do the job is the exception, not the rule, for humanities software, and even when commer- 
cial products are available their use is often limited by platform dependencies. 

Writing shares with foreign language teaching a laboratory approach, and writing centers and lan- 
guage laboratories are natural sites for the adoption of new technology. In foreign languages the 
interest began even before the advent of the microcomputer. The University of Illinois offered pro- 
grammed language learning on the main frame- based PLATO system The first use of microcomput- 
ers was electronic drill programs that relieved the drudgery of workbook grading. Several of these 
have been developed and are in wide use, including Dasher (University of Iowa), CALLIS (Duke), 
and MacLang (Harvard). Brigham Young University developed a system for testing students online 
in order to determine what level of language course to offer them. A more flexible approach was the 
inclusion of grammar and dictionary information in specialized word processing software, a tactic 
that was also well used by James Noblitt (University of North Carolina) for foreign language learn- 
ing. Although some have used the opportunity of computer-based language learning to study the 



RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE 



patterns of second-language acquisition, this 
remains an underdeveloped area of inquiry. 

Starting in the early 1980s as the communicative 
approach to language learning v.'as becoming 
accepted, multimedia was identified as offering 
tremendous potential for communicative 
methodologies. Like the process approach current 
in writing studies, the communicative approach 
was a good fit to the medium because it empha- 
sized process over product, stressing the impor- 
tance of exposure to authentic native speech 
(which can be delivered on video, richly annotat- 
ed and cross-referenced), and valuing the acquisi- 
tion of context-sensitive language funaions (such 
as expressing agreement, asking for help, greeting 
a friend or a stranger) over the memorization of 
word lists and grammar paradigms. 

Multimedia for language learning was pursued 
actively at MIT, which produced narratives and 
documentaries specifically scripted and shot for 
interactivity (Athena Language Learning 
Project) and at the University of Iowa (PICT) 
and the University of Pennsylvania, both of 
which produced systems for adding subtitling 
and phrase-by- phrase control to existing visual 
material. The Iowa project focused on acquiring 
the rights to foreign television; the Pennsylvania 
project focused on films available on videodisc. 
The military service academics have made wide 
use of interactive video workstations, mostly 
using re-purposed educational videos, and the 
CIA is currently working on course materials 
that would eliminate the teacher from the sys- 
tem, starting with introductory courses in 
Spanish, Russian, and Arabic. Military-spon- 
sored efforts, though well funded, have often 
been pursued at a distance from university 
methodologies and research. 

University-centered efforts have not looked to 
eliminate the teacher but to reform language 
teaching in order to incorporate more authentic 
video, facilitate discovery learning by students, 
and move the teacher to the role of a task 
designer rather than the sole provider of infor- 
mation. The difficulty with a teacher- and text- 
cenicrcd approach to language lc*arning is that 
the teacher, often not a native speaker, becomes 
the sole model of the language. The text pre- 
sents language in a way that emphasizes written 
over oral forms and sometimes leaves students 
unable to speak or comprehend spoken lan- 
guage. By contrast, electronic media can offer 
multiple native conversationalists and introduce 
native speech from the earliest stages of lan- 



guage learning without overwhelming the learn- 
er. What is needed next is a more clearly 
defined methodology to exploit the technology 
appropriately. 

Two advanced potential areas of language learn- 
ing software await a more developed technolo- 
gy: grammar correction and pronunciation 
practice. Natural language processing systems 
have been used to model language teaching 
(Xerox PARC, MIT, Carnegie-Mellon, 

University of Maryland), but this remains an 
area of research with only limited experience 
with actual students. The technology for creat- 
ing spectrograms is now widely available on 
desktop computers, but despite promising early 
work in adopting it for language learning 
(MIT) it has not yet been developed for wide 
use. Both of these await development over the 
next decade. 

History, Literature, and Culture 
In the traditional humanities core disciplines 
electronic educational materials have been de\^el- 
oped in response to the demands of specific sub- 
ject matter. Although no methodology has been 
explicitly articulated, there has been a general 
attempt to introduce dense primary materials at 
the undergraduate level and to synthesize com- 
plex materials that had previously been studied 
separately. In the field of history two simulations 
of the 1 980s provide models that have not yet 
been widely followed. One, “The Would-Be 
Gentleman” (Stanford) invited students to expe- 
rience ancien regime France in the persona of a 
young man trying to succeed at court. It includ- 
ed economic simulations as well as cultural 
knowledge, such as how to make an advanta- 
geous marriage. Another, “The Great American 
History Machine” (Carnegie-Mellon) offered 
census data and numerous ways of configuring it 
and representing it graphically, allowing students 
to explore many possible correlations in social 
trends. Interactive video simulations have also 
been used at Carnegie-Mellon to introduce phi- 
losophy students to issues in ethics. These are all 
areas in which hands-on manipulation of a sim- 
ulated world or statistical model can foster rhe 
process of humanistic exploration of many 
answers to the same question, or many causes of 
one result. Despite their promise, little effort has 
gone into the creation of such models so far. 

The marketplace is responding at the level of 
the electronic textbook; commercial and univer- 
sity publishers have begun offering literary and 
critical works in electronic form. The most 
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ambitious of these, the Voyager Company, has 
created a format well suited for teaching pur- 
poses Voyagers Extended Books the teacher 
can ;^..cparc a teaching edition complete with 
marginal notes, highlighting of passages, mark- 
ing of pages, automated searches for keywords, 
and a notebook for copying citations complete 
with source and page number. 

With the advent of CD-ROMs some of these 
books have multimedia extensions. Among the 
most promising of the Voyager series are “Who 
Built America?,” a history of the United States 
from a working class and leftist viewpoint; 
“American Poetry,” an anthology that includes 
readings of the poems in digital audio; and Art 
Spiegelman s “Maus,” a presentation of his com- 
pelling graphic novel on the Holocaust with 
primary documents and records of his drawing 
process. The extended book is clearly a format 
that publishers are comfortable with, since it 
retains the bcok metaphor and allows the use of 
texts that already have a reputation and a fol- 
lowing. Although current software is slow and 
awkward in many ways, and the book metaphor 
can be very limiting, extended books hold great 
promise as classroom presentation tools and for 
library reference. Their use will probably be 
supplementary in the immediate the future 
until reductions in cost and the spread of elec- 
tronic technology make it practical to use elec- 
tronic media as the primary delivery medium 
for texts. 

More ambitiously, several comprehensive proj- 
ects have aimed at using hypertext architecture 
to present teaching materials. The Perseus 
Project (developed at Harvard, but now housed 
at Tufts University) presents a wide range of 
visual objects from ancient Greece combined 
with the texts of Greek literature. At Brown 
University the Intermedia Project of the 1980s 
was enthusiastically adopted in the humanities 
with critical webs developed for nineteenth-cen- 
tury authors under the direction of George 
Landow. When the Intermedia software became 
obsolete, these webs were later transferred to 
Story Space. The project demonstrated that 
hypertext could be used to model the method- 
ology of the humanities as well as represent its 
content. It also raised many still-unanswered 
questions about the difficulties of navigation in 
hypertextual environments. 

Most of the current work in hypermedia envi- 
ronments has centered on single-author collec- 
tions, including the development of electronic 



editions, which combine texts with photofac- 
similes of original texts, and with video and 
audio of performances. For instance, work in 
this area is in progress on Manrique (University 
of North Carolina), Goethe (Dartmouth), Yeats 
(University of Tennessee), and James Joyce 
(Boston University). Other projects (Rossetti at 
the University of Virginia, and Shakespeare at 
MIT) transcend the edition and move toward 
creating comprehensive electronic archives that 
serve both teaching and research purposes. The 
attempt of these projects (and many others 
rapidly springing to life) is to bring together in 
appropriate proximity to one another materials 
that are hard to find or not previously found. 

For every large project with substantial 
resources there are probably a hundred home- 
grown hypercard stacks (or ToolBook stacks, or, 
increasingly, HTML Web sites) developed for 
individual courses at single institutions. The 
widespread use of simple hypertext and hyper- 
media structures will increase the level of 
sophistication and the demand for more com- 
plex tools among humanists in general. 

Although it is limited to text, the Women 
Writers* Project (Brown) is remarkable in that 
the compiling of an electronic archive has facili- 
tated the teaching of otherwise unknown or 
inaccessible lexts, although the texts themselves 
are often issued in book form. The Brown proj- 
ect is an exception to what seems to be a trend 
to establish archives of single male authors. 
Clearly more work needs to be done to make 
sure the electronic environment offers wide cov- 
erage of our cultural heritage and is not devel- 
oped haphazardly. 

The work of the Text Encoding Initiative (TEI) 
group has been a tremendous boon in offering 
standards for archiving text, but that takes care 
of only one part of the puzzle. A similar effort 
is needed in developing software for accessing 
these text archives, especially hypermedia 
archives. Attempts at multimedia authoring 
environments at Brown, MIT, Stanford, and 
Dartmouth have been either too large or too 
small, but never just right. Furthermore, the 
marketplace is unlikely to supply the kind of 
archiving environment needed by humanists, 
who require both precision of reference and 
preservation of context, and who also need to 
shift focus from one document to another (and 
one medium to another) as they work. It would 
be useful to encourage several archive/edition 
projects to collaborate in developing a standard 
cross-platform, open-architecture authoring and 
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reference environment for humanists. Provided 
that the range of users were broad enough and 
the resources for developing the environment 
were sufficient, an archive architecture with 
multiple examples of implementation could be 
developed within five years. With an open 
architecture it could continue to be improved 
upon and refined with code that could be 
shared among institutions. 

Film and Media Studies 

Electronic technologies offer great promise for 
the field of film and media studies, but this 
promise is hampered for now by copyright 
issues. Several promising projects, including 
Larry Friedlanders Shakespeare Project of the 
1980s (Stanford) and the UCLA Roger Rabbit 
project, were prevented from reaching wider dis- 
tribution owing to copyright issues. As more 
films become available in electronic format, they 
can be bought separately and then used in con- 
junction with educational software. But this will 
not solve the problem of network delivery. Legal 
solutions are more important to this area of edu- 
cational innovation than software solutions. 

Teaching Creative Artists in the New Media 
In addition to furthering the study of the exist- 
ing arts and humanities, the electronic media 
are giving rise to new art forms. Michael Joyces 
afternoon (1987) and Stuart Moulthrops 
Victory Garden (1992) are notable examples of 
the genre of hypertext fiction. Electronic fiction 
courses have been offered at Brown (by Robert 
Coover) and at MIT (by myself) since the early 
1990s, long enough to ' gin to see new genres 
emerging as young writers born into a world of 
interactive media come to maturity. Central to 
this effort is the perception that the computer 
and the Internet are not just telephone wires for 
carry'ing “content” in traditional linear formats: 
they constitute a new medium that will have its 
own structures of representation and therefore 
its own appropriate forms of artistic expression. 

Again, teaching efforts are hampere d by a lack 
of software development. The current authoring 
environments for hypertext narratives — Web 
browsers, HyperCard and its imitators, and 
Story Space — are all structurally limited. There 
is a pressing need for software that will facilitafe 
spatialized writing (i.c., writing that is navigated 
rather than paged through), making links, and 
creating interactive structures without program- 
ming knowledge. More ambitiously, there is a 
need to adopt the methodologies of artificial 



intelligence, particularly knowledge representa- 
tion and agent-building, for the making of plot, 
character, and narrative form. 

The Use of the Internet 

Access to materials over the Internet is increas- 
ing exponentially for scholars and students. The 
increase in material on the global spaghetti plate 
known as the World Wide Web makes the job 
of humanities librarians particularly crucial. The 
editorial function^ of reviewing, filtering, vet- 
ting, listing, and annotating sources will become 
increasingly valuable as available materials prolif- 
erate. Teachers will need guides to important 
resources and assessments of their reliability. 
Students will need training in how to navigate, 
use, and evaluate Internet resources. Software 
will be needed to access the many kinds of 
information — bibliography, hyperlinks, quanti- 
tative databases, audio and video files — on the 
Web and make it readily available to students. 
Humanities educators will be in particular need 
of clearer copyright rulings, and of the extension 
of “fair use” rules to electronic media. 

The Perceived Threat to the Book 
and to the Teacher 

One of the results of the increase in the use of 
electronic media is a re-evaluation of books as a 
technology for disseminating kn(-»vledge. 
Ongoing scholarship on the beginnings of the 
print era is helping to contextualize current 
unease at the supplanting of the book as the 
primary means of intellectual communication. 

A debate has been joined over whether thought 
itself depends upon the linear presentation and 
physical pages and binding of the book, or 
whether other modes of organization and pre- 
sentation may sometimes be preferable for cap- 
turing the richness of the human intellect. 

At the same time economic forces are calling for 
electronic delivery of “distance learning” inde- 
pendent of the instructor. The humanities and 
the arts are particularly vulnerable, with weaker 
funding sources but a higher level of dependen- 
cy on personal interchange. 

Both of these challenges call for a caieful con- 
sideration of the appropriate roles for electronic 
media in carrying forward the work of human- 
ists and artists. Attention should be paid to 
identifying what kinds of intellectual processes 
are facilitated by the new media. It will be 
important to sponsor significant educational 
innovations, large enough to constitute a depar- 
ture from usual procedures, and to develop 



reliable, qualitative methods of evaluating edu- 
cational results in the humanities. The anthro- 
pological approach developed at Brown for the 
Intermedia Project might serve as a good model 
for qualitative evaluation. 

FUTURE DIRECTIONS 
The next step for humanities teaching and 
learning is the creation of course-sized materi- 
als, the electronic equivalent of the textbook, 
and the development of new curricula based on 
electronic delivery of information. For instance, 
foreign language teaching should be rethought 
now that it is possible to deliver large databanks 
of authentic speech with extensive annotation 
that make them accessible to the novice. 
Shakespeare studies, which have long struggled 
with videotapes, can be reformulated once we 
can deliver an environment that allows for 
immediate retrieval of quarto, folio, and multi- 
ple performances. History can be taught with a 
much larger access to databases and primary 
materials at the undergraduate level. Now that 
we understand some of the basic elements of 
humanities educational computing, the next 
stage will be to develop core reference/learning 
environments and to reformulate curricula to 
take advantage of them. 

The new learning paradigms w'ill require 
redesign of classrooms as well, with special care 
to create spaces where students can speak to one 
another and to the teacher as well as interact 
with computer displays. The next few years 
should begin to offer us some models for work- 
ing humanities classrooms, based on models in 
wide use now at such places as Stanford, 

Brown, MIT, and Penn State. 

The creation of course-sized electronic curricula 
will require work along the other directions 
already mentioned: the standardization of deliv- 
ery environments; the collective design and 
development of authoring software specialized 
for humanities applications; the development of 
new copyright procedures for digital material; 
the need for refining qualitative evaluation pro- 
cedures for humanities education. 

In all of these areas, it is important that human- 
ists take the initiative in shaping the education- 
al environment of the next century. 
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STATE OF THE ART 

The proliferation of electronic information and communication systems has created a crisis of 
accountability and evidence. As more and more of the records of our society are available in electron- 
ic form, users are asking how they can be sure electronic records created in the past will be available 
in the future and how they can be sure those received today are trustworthy. The issue is critical for 
all aspects of humanistic studies because these scholarly disciplines depend on the study of original 
texts, images, and multimedia sources. To even imagine the humanities, it is essential to have correct 
attribution, certainty of authenticity, and the ability to view sources many decades or centuries after 
they are created. 

While the question of how to create and preserve electronic evidence (records with provable authen- 
ticity) has been with us as long as computing, research in this field is relatively new in part because 
until very recently few source materials were created electronically, and available solely in electronic 
form. Thus, in 1991, when the U.S. National Historical Records and Publications Commission 
sponsored a working meeting on Research Issues in Electronic Records, virtually no published 
research was available. Since the publication of the report of that meeting, the field has proliferated 
(see special issues of American Archivist (U.S.), Archivaria (Canada), and Archives & Manuscripts 
(Australia), within the past year), although major areas are still underdeveloped. 

Currently the research in archiving and authenticity falls into four broad categories: 

♦ Preserving signals recorded on different media 

♦ Preserving “recordness,” or the attributes that ensure evidential value, which some refer to as 
“intellectual preservation” 

♦ Preserving functionality, or ensuring software independence and interoperability 

♦ Establishing a social and legal standard for evidence, supported by best practices and guidelines 

On the simplest level, archiving has to do with preserving bits. Because electronic recording media 
are inherently unstable, it has always been a matter of concern to ensure that the electronic signal be 
preserved over time. Practical interest in denser and longer-lasting methods of storing data has meant 
that the short history of electronic recording has w'itnessed the commercialization of a large number 
of different data storage media and media formats. The rapid evolution of media has meant that 
considerable attention has been devoted to avoiding obsolescence and developing methods to read 
and copy media from previous generations of systems. In general, previous media, layouts, and for- 
mats can be read with appropriate hardware and special-purpose software, but devising new methods 
to read old signals in old media is becoming more complex as media proliferate, recording and lay- 
out methods become more proprietary, and firmware plays a greater role in decoding. 

Archivists, and increasingly scholars, arc aware that beyond preservation of bits lies the arena of pre- 
serving “recordness.” Research into what makes an electronic document or dataset a record, and how 
the constituent parts can be bound together, has become critical as communication of electronic 
information has become more widespread. In the past several years, electronic mail, groupware, and 
digital image banks have forced society to confront the issue of authenticity or reliability of an elec- 
tronic communication and spawned much research. Most recently, research has attempted to define 
the functional requirements for recordkeeping and the meta-data attributes of evidence. 




Electronic records are always software dependent, but the extent of these dependencies varies widely. 
More and more electronic objects are not merely static entities, but parts of systems in which they 
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represent potential functionality. In recent 
years, dynamic links, objects that affect system 
states, and data entities that respond to their 
environment have significantly increased the 
difficulty of preserv'ing electronic records. New 
questions are arising about the concept of 
migrating functionality and the meaning of 
interoperability. Methods of overcoming, or at 
least representing, software dependence over 
time arc critical to the survival of the record. 

Finally, society has responded unevenly to the 
spread of electronic communication capabilities. 
Some new legal and professional standards have 
been established; elsewhere research is under 
way to define new practices and guidelines for 
electronic documentation and action. Methods 
for bilateral commercial contractual communi- 
cation are in place, but multilateral methods are 
still being studied. How to enable electronic 
patient records, patent documentation, or copy- 
right registration, and how to ensure privacy, 
confidentiality, protection of proprietary infor- 
mation, and the management of similar infor- 
mation-related risks is the subject of active 
research on the interface between sociolog)^ pol- 
icy, and technology. 

CURRENT RESEARCH AND 
ITS PROMISE 

While research continues on each new medium, 
to establish its life and the best conditions for 
its storage and use, the research agenda has 
moved beyond storing bits with the growing 
acceptance that the only way to preserve elec- 
tronic data across time is to periodically copy 
(refresh) the information to new storage media 
and, at appropriate times, to new formats. 
Leadership in these technical means of preserv- 
ing bits has belonged to the National Media 
Laboratory, a spin-off of the 3M Company and 
the contractor used by federal projects and by 
the National Institute of Standards, which 
establishes tests for media. In recent years con- 
siderable research has focused on how to deter- 
mine the right time for media conversion, how 
to choose appropriate new media, and how to 
predict long-term costs. While this research is 
important to computer operations, it does not 
contribute specifically to arts and humanities 
computing. 

The issue of the authenticity of records, on the 
other hand, is at the heart of all humanistic 
scholarship. If v/e do not know the context in 
which information was created, and who partic- 
ipated in creating it, many of the questions of 



greatest interest to historian.^, philosophers, lin- 
guists, and creative artists are unanswerable. 
Contemporary electronic information systems 
generally do not create or store records that sat- 
isfy these criteria. Not surprisingly, research into 
methods of ensuring the creation and retention 
of electronic evidence is a hot topic in archives, 
museums, and electronic libraries. The most 
important research in this area has focused on 
the functional requirements for records. It has 
appeared under the corporate names of the 
National Archives of Canada,' the World Bank, 
and more recently the University of Pittsburgh." 
It is recognized in the published research ofThe 
Rand Corporation'" and the Dutch Ministry of 
the Interior.'' This research joins a recent thread 
of discussion and debate in the library commu- 
nity, regarding what Peter Graham of Rutgers 
University has called “intellectual preservation.'’ 
Although this concern is the focus of discussion 
in the Task Force on Digital Archiving spon- 
sored by The Research Libraries 
Group/Commission on Preservation and 
Access, at present it is not really the subject of 
original research in the library community. 

Current research on software dependence and 
interoperability, which is not largely driven by 
archival concerns, takes a relatively short-term 
view of the requirement to preserve functionali- 
ty. Little research has been done on modeling 
the information loss that accompanies multiple 
migrations or the risks inherent in using com- 
mercial systems before standards are developed, 
yet these are the critical questions being posed 
by archives. Little in these studies specifically 
addresses the humanities, except that the 
humanities are particularly heavy users of old 
documentation and thus especially need to 
develop means of overcoming system dependen- 
cies in data. 

Margaret Hedstrom of the New York State 
Archives, and the University of Pittsburgh proj- 
ect, have led the way in exploring the social and 
legal guidelines for electronic records manage- 
ment. The Association for Information and 
Image Management has sponsored conferences 
and a task force that examines these issues; the 
Center for Electronic Law at Villanova 
University is also working in this area.' There 
has been substantial research in electronic labo- 
ratory notebooks and electronic patient records, 
but oddly little research has been done to iden- 
tify critical dimensions of archiving for program 
audits in areas like decision support systems, 
groupware and team support systems, or even 
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traditional “management information systems” 
or project management environments. 

Related areas of research include: 

♦ Methods for conversion of paper-based 
information to electronic media; research at 
Cornell University'" and Yale University 
are most noteworthy. 

♦ Knowledge representation, including espe- 
cially the documentation of archives using 
SGML, as reflected in the work of the Text 
Encoding Initiative.'"' 

FUTURE RESEARCH NEEDS 
The most significant area for research in the 
near future is the meta-data required for rccord- 
ness and the means to capture this data and 
ensure that it is bonded to electronic communi- 
cations. The announcement by the National 
Institute of Standards of a proposed Federal 
Information Processing Standard (FIPS) for 
“Record Description Records”'* could be the 
stimulus for immediate research, as is the pro- 
posal by Standards Australia, based on the 
University of Pittsburgh research. Continued 
investigation of mechanisms to specify meta- 
data encapsulated objects* and capture them in 
implementations*' are most promising. Over the 
next five years, specifications for workgroup 
tools and electronic office environments will 
need to have these methods built in. Large-scale 
networks, and the acceptance of electronic 
transactions as the preferred means of intra-cor- 
porate communications, will depend on meth- 
ods of uniquely identifying messages, 
controlling their access and use, and decoding 
their structure, context, and content. As the sci- 
entific community has come to realize,*" stan- 
dard meta-data, grounded in a continually 
updated understanding of disciplinary perspec- 
tives, is essential to future documentation. 

Unless generic, scaleable approaches for repre- 
senting humanistic points of view are developed 
soon, the history of modern societies in the late 
twentieth century will be extremely incomplete, 
to the detriment of future scholarship in all 
humanities fields.*'" 

Ongoing applied research on the archival signif- 
icance of dynamic documents* object-oriented 
software environments, and interoperability is 
needed in the medium term. There is very little 
active work in this area, but the potential bene- 
fits to archives would be substantial if even such 
basic questions as the best ways to avoid loss of 
functionality in software migrations were 



answered. Solutions to most of these problems 
will need to involve collaborations between 
technologists, archival participants, and poten- 
tial future users. Such research projects can be 
expected to be relatively costly and of extended 
duration, and will be ongoing as new function- 
alities are propagated. Yet unless such software 
independence can be achieved, we can hardly 
imagine the widespread acceptance of interac- 
tive documents or multimedia and visualiza- 
tions within traditional communications. 

Within organizations, archivists must find auto- 
matic means of identifying the business process 
for which a record is generated. Such data mod- 
eling will become increasingly critical in an era 
of ongoing business re-engineering. If records 
are retained for their evidential significance and 
for a period associated with risk, then certain 
knowledge of their functional source is essential 
to their rational control. If they are retained for 
long-term informational value, knowledge of 
context is necessary to understand their signifi- 
cance. Work in these areas will be stimulated by 
standards such as those drafted by Standards 
Australia and NIST in the spring of 1995. 

Concrete work on social and legal issues will be 
best focused on identifying warrant for archival 
functional requirements in professional and 
organizational practices, locating required 
changes in law in such areas as privacy, freedom 
of information, and protection of proprietary 
rights and in applications such as electronic 
patient records, electronic laboratory notebooks, 
and contractually obligating electronic commu- 
nications and commerce. While progress can be 
expected in all these areas anyway, a concerted 
research agenda would coordinate findings, has- 
ten the arrival of the fully electronic society, and 
make it possible to realize the benefits of elec- 
tronic records within the next decade. Much 
work on attributes of electronic business systems 
is being conducted in these areas, but it is cur- 
rently little informed by professional archivists. 

Ultimately, we must research the use of elec- 
tronic records after their value for accountabili- 
ty has been realized. How and why are they 
used? What value does their information have 
for users, and is the value of information in 
records created for other purposes commensu- 
rate with the value of information contained in 
self-consciously created information sources, 
such as books and articles? What do we need to 
know about the content of records to justify 
discovering and retrieving billions of them 
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across heterogeneous environments? What does 
the subsequent use of records itself tell us about 
the nature of society in the years since the cre- 
ation of the record and the transaction it docu- 
ments? Here a lead could be taken by archivists, 
but little substantive research has been under- 
taken to date except in the area of defining the 
requirements for networked information discov- 
ery and retrieval/'' 

It is now evident that we can envision a world in 
which virtually all records are digital, including 
much of the knowledge of the past. How can we 
make our solutions to retention, access, and 
preservation of the digital cultural heritage of 
the world scaleable? What cost-efficiencies can 
we achieve over keeping paper records and mak- 
ing them available through libraries, archives, 
and museums when we are deploying systems of 
distributed control and access spanning all 
records? Future research will need to focus on a 
variety of implementation issues having to do 
with intelligent information seeking, end-to-end 
delivery, and migration of data on a universal 
scale.’^ Again, very little has been done in this 
area, although recent progress implementing 
Government Information Locators using the 
Z39.50 protocols suggest some of the potential 
for a Global Information Infrastructure locator 
and document delivery service.'"' 
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Presumably marts spirit should be elevated if he can better review 
his shady past and analyze more completely and objectively his 
present problems. He has built a civilization so complex that he 
needs to mechanize his record more fully if he is to push his exper- 
iment to its logical conclusion and not merely become bogged 
down part way there by overtaxing his limited memory. His 
excursion may be more enjoyable if he can reacquire the privilege 
of forgetting the manifold things he does not need to have imme- 
diately at hand, with some assurance that he can find them again 
if they prove important^ 

Vannevar Bush understood the multi-polarity of technologically induced and -supported change: 
computing, scholarship, and society weaving an intricate dance, each responding to and in turn gen- 
erating a complex web of new and old forces, institutions, rules and standards, ideas. Reviewing the 
settings in which these transformations occur is a requisite first step toward assessing their impact on 
scholarship in the arts and humanities. 

This paper discusses the interplay between distributed networked computing and creativity and schol- 
arship in the arts and humanities. The first section provides an overview of certain elements of th‘s 
evolving relationship, including role transformation and agents as well as inhibitors of continuing con- 
current development. The next section discusses four major uses of networked computing for the arts 
and humanities, and the final section identifies an agenda for further research and development. 

ROLES, RESPONSIBILITIES, EXPECTATIONS 

Over the last several years, traditional distinctions among key actors and activities within scholarly 
creation and communications have begun to disappear. Words like “creator,” “publisher,” “user,” 
“work,” “doc ument,” “institution,” “record” have become problematic, as the activities they represent 
and the borders that separate them have blurred. Original source material (such as the recently dis- 
covered cave paintings in France, and the Whitman and Vatican archives) are increasingly available 
to all users of the Internet/World Wide Web. Internet discussion groups lack traditional status mark- 
ers (such as “Doctor,” “professor”): according to the by now well-known New Yorker emoon showing 
two dogs seated in front of a computer terminal, “On the Internet, nobody knows you’re a dog.” 

The lack of status markers can empower institution-free research: the demarcation between academic 
and private scholarship, already dissipating in the sciences, is difficult to sustain when major 
resources and outlets for research are widely distributed. 

Parallel transformations are taking place in the major institutions that sustain and utilize arts and 
humanities scholarship. Scholarly publishers (in the arts and humanities, largely but not exclusively 
smaller publishers and societies) feel threatened by alternative modes of dissemination (by individuals 
and libraries, for instance) and the proliferation of peer-reviewed electronic journals accessible on the 
Internet. Some of these journals, such as Psycoloquy and the Bryn Mawr Classical Revieiv, have an 
Internet circulation that greatly exceeds the subscription list for many print journals. 
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Research libraries face similar uncertainties: budget reductions coupled with continuitig price 
increases for scholarly books and journals have forced even the largest, best-endowed libraries to con- 
sider access rather than ownership as a key measure of excellence. But ensuring access to research 
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information also means replacing the current 
library-centric system with a multi-institutional 
mode! supporting distributed information man- 
agement, with associated structures for con- 
tracting, budgeting, billing, and payment. And 
the increase in electronic access to original 
material means that museums and galleries , 
must change well. The technical require- 
ments of distributed dissemination and owner- 
ship of scholarly information are relatively 
straightforward; the institutional ones are diffi- 
cult to define, and much harder to resolve. 

BOUNDARIES AND BOTTLENECKS 
The pace of change is rapid, and difficult to 
assess. Several other bottlenecks, arising from 
the complex transition from traditional to net- 
work-driven scholarship, are worth mentioning 
as well. First, the universes of discourse in the 
arts and humanities and in computing are fun- 
damentally different. To oversimplify quite a 
bit, the humanities and the arts are about struc- 
ture, fialogue, insight, and expanding frame- 
works; computing is about answers. Computer 
scientists are more uncomfortable with the 
World Wide Web than humanists arc: it s good 
at generating questions, bad at answering them. 

Traditionally, one must pass through at least 
three key gates (with their gatekeepers) in order 
to become a recognized scholar: complete the 
dissertation, be hired by the right institution, 
get tenure. Not only in the arts and humanities, 
but even in the sciences, computer-assisted 



scholarship and dissemination have little if any 
role in these critical processes. At a recent con- 
ference, participants rejeaed as totally unrealistic 
a five-year goal of tenure entirely supported by 
electronic scholarship. Without movement in 
this direction, however, only already tenured and 
private scholars will be able to make full use of 
the power and promise of computer-supported 
research and dissemination. 

In a networked world the lines separating cre- 
ator, publisher, library, and museum become 
blurred. Further complicating the situation are 
uncertainties about the basic nature of electron- 
ically created and disseminated information. In 
a print-centric world, for example, the differ- 
ence between an original and a copy is obvious; 
it is difficult to alter the text of a book or pic- 
ture without leaving traces. But there is no dis- 
cernible difference between an original ar.d an 
instantiation of a computer-accessible book or 
picture, and alterations are hard to identify and 
trace. Furthermore, the difference between pub- 
lished and unpublished print works is under- 
stood; in a networked world, electronic mail 
(for instance) is owned by its originator, and 
probably (usually) unpublished. 

Rather than looking for new roles (with new 
boundaries) to replace the older ^ncs, it may 
help to think about managing annuli, or zones 
of progressive release (see figure below)." 

Note that this model includes no roles, only 
processes. Roles bear assumptions about the 
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present into the future, while processes are easi- 
er to define and debate. 

THE USES OF COMPUTING IN THE 
ARTS AND HUMANITIES 
Four potential contributions of computing to 
the arts and humanities are discussed here: 
resource identification; analysis; collaboration 
and re-creation; dissemination. Each can 
increase access to information in the arts and 
humanities, despite significant social, economic, 
organizational, and technical challenges. 

Resource Identification 
Lycos (the largest World Wide Web search 
engine) currently indexes more than sixteen 
million home pages. By the time this study is 
distributed, there will be several million more. 
In addition, there are several thousand Internet 
and Usenet mailing lists, and thousands more 
on private systems like CompuServe and 
America OnLinc. Traffic on the Internet con- 
tinues to double about every eight months. 



ing resources rather than assessing the value of 
any particular information resource in relation 
to a specified need. What will be greatly needed 
are automated summarization, integration of 
related works into single multimedia docu- 
ments, and automated tracking of the origin 
and evolution of particular works. As value- 
added services evolve, users will demand quality 
standards; at present, neither the tools nor the 
social and economic infrastructure exist to sup- 
port them. 

Analysis 

Structured digital archives like ARTFL (for 
French language and literature) permit 
researchers to search a document corpus and 
locate related texts within and among various 
documents. Advanced programs make it possible 
to use semantic analysis to compare the styles of 
various works and authors. For some time, data- 
base programs have allowed users to introduce 
complex statistical analyses into arts and human- 
ities scholarship (e.g., cliometrics in history). 
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Currently, Internet/World Wide Web users dis- 
cover resources by means of an intricate mesh 
of personal relationships (often mediated by 
electronic mail), hyperlinks to related resources 
(as defined by the links creator), print and elec- 
tronic directories, and serendipity. 

This process is frustrating and time-consuming 
at best, intensified by the intrinsic uncertainties 
in the Internet (e.g., whether a resource has 
moved or disappeared, and whether it can be 
reached). Improving resource discovery is less a 
technical than a social and organizational prob - 
lem, bringing to bear the skills of scholars and 
librarians: scholars to direct the construction of 
domain ontologies, for example, and librarians 
to generate and manage distributed subject 
matter and ensure access to and coherence of a 
given collection. 

The explicit and implicit systems for assessing 
value in the print world are scarce or absent in 
networked information: peer review across a full 
range of disciplines; the standing of the particu- 
lar publication, gallery, or museum; the back- 
ground, experience, and credentials of the 
author or creator. Except for a few peer- 
reviewed electronic journals, these value mark- 
ers have not been translated into the digital 
world: indeed, resistance to externally mandated 
assessment is rooted deep in Internet culture. 
Furthermore, librarians have traditionally 
focused on developing collections and idimtify- 



These investigations are possible because the 
fundamental elements (words, sentences, and 
paragraphs) of written and oral communica- 
tions are clearly defined for any given language, 
and carry a shared constellation of meanings. 

For pictorial or sound works, however, the situ- 
ation is murkier. Currently, works in non-textu- 
al media are cataloged by attaching to each of 
them sets of descriptive words using a prede- 
fined structure and vocabulary. These words 
permit a searcher in a photographic archive, for 
example, to find pictures of sunsets, or boats in 
a harbor; depending on the conventions used to 
describe the photographs, finding pictures of 
boats at sunset may also be possible. Despite 
extensive research, tools for identifying similar 
pictures, for instance, are erratic and primitive; 
it is hard to imagine a social infrastructure and 
technology that would provide a helpful answer 
to a question like “I want more music which 
makes me feel like the last piece did.” 

Collaboration and Rc-Crcation 
Network-supported scholarship is intrinsically 
collaborative. Electronic mail, for instance, per- 
mits physically separated colleagues to collabo- 
rate on research and publication. Equally 
important, Internet listservs help researchers 
identify others who share common interests, 
which may ultimately lead to new, collaborative 
research projects. Finally, networks support 
expanding authorship. In the last several years, 
the average number of authors of scientific 

68 



RESEARCH AGENDA FOR NETWORKED CULTURAL HERHAGE 



papers has increased significantly: in some sci-' 
entific disciplines, papers with 100 or more 
authors are not uncommon. These new capabil- 
ities contradict the traditional model of the soli- 
tary scholar seeking tenure, or the lone painter 
in her attic at midnight. 

Such collaborations require support from new 
models for identifying and managing author- 
ship and ownership. Clearly, increasing from 
hundreds to thousands of authors for individual 
works simply exacerbates the problem, but cur- 
rently there arc no clear methods for establish- 
ing and measuring the relative contributions of 
each. In fact, it is hard to imagine how such 
methods might operate: how much credit, for 
instance, should go to an author who wrote half 
an article, as opposed to another who provided 
the critical insight but wrote none of the words? 
These problems are difficult enough in static 
environments. In a networked, digital world, 
works will be created, revised, and expanded; 
new media will be incorporated; links to exter- 
nal resources will be generated; the resulting 
work may not share a single sentence or image 
with the original one, despite a clear chain of 
provenance. Whose work is it? Legally? 
Intellectually? Morally? 

Dissemination 

Inextricably linked to evolving systems for col- 
laboration and re-creation of information are 
new methods of disseminating scholarly results 
in the arts and humanities. The proliferation of 
scholarly subspecialties has led to an increase in 
the number, and narrowing of the scope, of 
scholarly publications. With circulation declin- 
ing as a result of budget reductions for libraries, 
among other factors, it is increasingly difficult 
for scholarship in the arts and humanities to 
find an audience. Artists and composers face 
similar obstacles. 

Netw'orked dissemination via the 
Internet/World Wide Web substantially reduces 
the barriers to entry, and lowers the cost of dis- 
semination. For example, setting up a Web site 
to display an artist’s works requires only a net- 
work connection, one of the several Web elec- 
tronic or print manuals, and patience. And new 
Web sites (particularly if they are announced via 
NCSAs "Wiat s New” page, for example) will 
be sought out by Web surfers. Absent standards 
of assessment," such as the institutional trap- 
pings of peer review, private Internet dissemina- 
tion or distribution through a non-reviewed 
electronic journal are unlikely to further tradi- 



tional careers. There is a real risk that individual 
disciplines will develop an intensified version of 
C.P. Snows two cultures: one lodged in univer- 
sities and print, the other everywhere else."' 

AGENDA FOR FURTHER 
DISCUSSION, RESEARCH, AND 
DEVELOPMENT 

Infrastructure 

Artists and humanists depend on a reliable, pre- 
dictable, coherent, and comprehensive informa- 
tion infrastructure. Users of major research 
libraries, for instance, can depend on well-orga- 
nized, comprehensive collections; consistent 
intellectual coherence from one library to 
another; and timely access to the major 
resources required. These systems, in turn, are 
supported by common sets of expectations and 
standards, painfully developed over many years 
in the library and museum communities. While 
certain coherent standards (such as URLs 
(Uniform Resource Locators) and Internet pro- 
tocols) already exist in the universe of digital 
information, other important ones (including 
naming, registration, and archiving conven- 
tions) are required. Further, the distributed, 
centrifugal force of the Internet is not always 
compatible with the centripetal force of shared, 
consistent protocols and standards. 

The World Wide Web amply demonstrates that 
a system dependent on URLs does not scale 
upward easily. URL-identified servers move or 
disappear; popular sites are inaccessible owing 
to burgeoning demand; location-dependent 
mirror sites are rapidly submerged in requests. 
Location-independent naming conventions 
(such as the handle system developed by the 
Corporation for National Research Initiatives), 
which are easily resolved into the location(s) of 
the digital information, would address this 
problem. But standardizing around any particu- 
lar convention is difficult for the Internet. In 
the meantime, the standards of coherence and 
reliability represented by libraries and museums 
will be lacking for many types of networked 
information. 

Global, consistent naming conventions derive 
their usefulness from standardized methods for 
registering digital information objects. Systems 
are required that permit creators and their 
agents to register the existence of a particular 
information object, determine the terms and 
conditions for its use, and identify which if any 
digital library systems are authorized to store 

6B 




NEW SOCIAL AND ECONOMIC MECHANISMS TO ENCOURAGE ACCESS 



and disseminate it. In addition, reliable record- 
ing systems are needed to allow potential users 
of information to identify who owns what. The 
technical requirements for these systems are well 
understood; the organizational framework 
remains to be developed. 

For centuries, libraries and museums have pro- 
tected rare works of art and scholarship from 
destruction. In a networked environment, how- 
ever, there ^re no straightforward methods to 
determine that a particular byte stream is, in 
fact, the last instantiation of a given work. 
Culturally, it is easy to delete a message, much 
harder to throw away a book.*'" Technically, it 
might be possible, for instance, to link any rare 
digital information object to z program that 
searches the Net for another instantiation 
before permitting itself to be deleted. Building a 
common framework supporting institutional 
cooperation across millions of digital collections 
and billions of information obiects over hun- 
dreds of years will be much more difficult. 

Already, recently developed digital information 
objects (such as the 1960 U.S. census and some 
early NASA data) are inaccessible owing to 
arcane and untranslatable data stiuctures. There 
are complex technical and organizational prob- 
lems in refreshing large volumes of digital infor- 
mation to ensure compatibility with new 
formats. The Task Force on Archiving of Digital 
Information, sponsored by The Commission on 
Preservation and Access and The Research 
Libraries Group, is reviewing these issues and 
will present its findings in the summer of 1995. 

Enhancing Access 

The idea of access embodies several distinct, 
potentially divergent models of technology, rela- 
tionships, and the individual creator or user. One 
model defines a funnel from (ideally) potentially 
infinite information resources at one end to (ide- 
ally) a specific answer to a stated question at the 
other: a historian seeking a date or a geographer 
looking for a map, for instance. While this model 
may support limited interaaion between infor- 
mation seeker and information resource, the pur- 
pose of the interaction is to narrow the funnel, 
not expand it. 

Several new capabilities are required to support 
this model of access. As mentioned earlier, 
methods are needed for determining and 
attaching quality assessments to information 
resources, tuned for particular purposes; so arc 
automated techniques to condense, summarize, 



integrate, translate, invoice, and pay for infor- 
mation from different sources. Underlying these 
technologies, social and organizational struc- 
tures are required for building and supporting 
flexible domain-specific ontologies. 

A different model, of which browsing is an 
example, seeks common threads among appar- 
ently disparate information resources. Here, 
interactions between user and resource generally 
focus on expanding the funnel, or altering the 
course of the information flow. Tracing the 
World Wide Web’s hyperlinks, for example, 
leads a user along intricately woven paths 
defined by each Web pages creators, ending 
only with exhaustion of the users time, money, 
or patience. 

A third model focuses on a dialogue between 
the user and a set of information resources 
(including its crea*. .r and other users); the 
information resource provides a framework for 
initial exchanges, which may result in new or 
transformed resources that may initiate new dis- 
cussions. This model links the network as an 
information resource with the network as a 
framework for interchange (demonstrated, for 
instance, in Internet chat and mailing lists). At 
least primitive technologies exist to support all 
three of these models; only the second one 
(hyperlinks) is widely supported at present. 

This model depends on a range of capabilities 
that are only just being identified. First, it 
requires seamless links between and among per- 
sonal, collaborative, and public work and play 
spaces, dynamically controlled by the user. The 
annuli model of progressive release, outlined 
above, provides an initial version of this capabil- 
ity. A multi-dimensional workspace, for exam- 
ple, would permit a creator/user (an artist, a 
poet, a scholar) to manage dialogues about par- 
ticular works along a path from private to pub- 
lic, determining at every point what 
information to retain, what to seek, what to 
share, when to talk, when to listen.'' 

Second, this model mandates seamless linkages, 
controlled by the creator or user, among informa- 
tion objects in all media. It should be straightfor- 
ward, for example, to add voice or video to 
electronic mail; or to participate in a virtual con- 
ference, seated at a virtual conference table, 
observing the expressions and movements of one’s 
virtual colleagues; or to translate speech to text, 
and text to speech.''" It should be possible to carry 
on most aspects of our private and public lives. 
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choosing face-to-face contact when it is desired, 
not when it is required for communication. 

CONCLUSION 

‘The historian, with a vast chronological 
account of a people, parallels it with a skip trail 
which stops only at the salient items, and can 
follow at any time contemporary trails which 
lead him all over civilization at a particular 
epoch. There is a new profession of trail blazers, 
those who find delight in the task of establish- 
ing useful trails through the enormous mass of 
the common record. ’Hie inheritance from the 
master becomes, not only his additions to the 
world’s record, but for his disciples the entire 
scaffolding by which they were erected. 

How far are we from achieving Bushs vision? 
Who will be the traiiblazers? What social and 
economic mechanisms will be required to sup- 
port traiiblazers in the arts and humanities, as 
well as those who come after? 

These questions need to be asked and answered 
in and through a complex, dynamic dialogue 
among multiple communities of practice, 
including individuals and institutions in the arts 
and humanities and computing, libraries, librari- 
ans and information scientists, policy makers, 
creators, publishers, distributors of print, sound, 
visual, multimedia, and digital information, pri- 
vate scholars, students, and many more. The 
dialogue involves speakers, listeners, and the 
spoken-for: all too often, the views of (for 
instance) artists, humanities scholars, and librari- 
ans have been presented by others. 

A major purpose of this paper, and the continu- 
ing discussions it is intended to stimulate and 
frame, is empowering the spoken-for to speak 
for themselves, by finding a shared language 
and a collective voice. Bush began this dialogue 
fifty years ago, and Bush’s vision remains power- 
ful because ft encapsulated technology in service 
to larger intellectual and social goals. 
Negotiating those goals, and identifying the 
technologies that will serve them, remains as 
significant and challenging as it was for Bush. It 
is time for new voices to be heard, and new 
audiences to hear them. 



Notes 



i Bush, Vannevar. “As We May Think.” The 
AtLntic Monthly {]n\y 1945): Section 8, paragraph 
9, page 14. [Pagination of the HTML version will 
differ from this citation, which refers to the 
ASCII version available over the Internet.) 

ii The claim that a million monkeys typing at a mil- 
lion word processors for a million years would 
sooner or later produce the works of Shakespeare 
has been disproved by the Internet (anecdote 
courtesy of Michael Lesk, itinerant sage). 

iii A friend of mine teaches selected essences of 
deconstructionist theory to computer science stu- 
dents. Since it matches their model of the world, 
they find it generally straight for\vard and obvious. 

Iv At a recent conference, I proposed the following 
criteria for determining the effectiveness of a glob- 
al system of digital libraries: that within five years, 
it would be as easy to throw away a book as to 
delete a message. There was an audible gasp from 
the audience. 

V Buckminster Fuller used to tell a storj' about a 
Master of one of the colleges of Cambridge 
University, who noticed a deep crack in the mas- 
sive beam supporting the colleges dining hall. Not 
knowing where to report it, he eventually notified 
the Royal Forester, who told him that he had been 
expecting the call. The Forester’s predecessor’s pre- 
decessor had planted the tree for the new beam, 
and it was ready. This, Fuller noted, was how a 
society ought to work. 

vi The last goal has been straightforward (and elu- 
sive) for thirty years. 

vii Bush, Section 8, paragraph 2. 
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TOPICAL INDEX TO THE PAPERS 

Paper 1 Tools for Creating and Exploiting Content 

Kolker and Shneioerman 

27 Diversity of the state of the art 

27 Disparity of equipment and access; mostly less than ideal 

27 Few involved full time in creation of tools for their disciplines 

27f Internet/ networked access; primitive organization of resources and access methods 

28 Exemplary Internet sites 

29 Asian languages and interface research 

29 Workstation software, especially parsers 

29 Future need to bring electronic resources to students 

29 Future needs for computer literacy of faculty 

29f Need for interface standards and methods of content access 




Paper 2 Knovc'ledge Representation 

Hockey 

31 Broad issues in knowledge representation 

31 Fidelity in text representation 

31 Genre or form in text representation 

31 Problem of representing structure and content independently 

32 Role of meta-data in making implicit information more explicit 

32 Text representation and the role of SGML in preventing obsolescence 

32 SGML-based text representation projects 

32 Multiple parallel hierarchies in SGML as a research problem 

32 HTML 

33 Standards; non-text conversion at bit rather than content level 

33 Representation of abstract categories such as weight, time, measures 

33 Representation of missing, incomplete, and sourced information 

33 Representations as surrogates 

33f. Representations as more than surrogates 

33f. Versions 

33 Non-linearity 

34 Representing the processes/context of creation 

34 Representing representation conventions employed 

34 Linking; how to link objects of different modalities; research issues 

34 ‘lyped relations and functionality 

34 Representing derived knowledge 

34 Representing traditional/papcr sources 

34 Representing legacy data 
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34 Automatic convcrsioki of representations 

34 Cost factors of representation 

34 Quality of representations research 



Paper 3 



35 

35 

35 

35 

35 

35 

35 

36 
36 
36 
36 

36 

37 
37 
37 

37 

38 
38 
38 
38 
38 
38 

38 

39 
39 
39 
39 
39 
39 
39 
39 
39 
39 

39 

40 



Resource Search and Discovery 
Marchionini 

Remote access increases need for search and discovery 

Need to integrate search and discovery tools with creation, use, and communication tools 

Existing genres of finding tools need electronic analogs 

Definition of search and discovery, and distinction between the two 

Map conceptual space to physical locations 

Primary, secondary, and tertiary sources all on Internet together 

Need for dynamic updating 

Evolving methods of string searching 

Little progress other than in text 

Ranking of results 

Using domain-based knowledge in retrieval 

Filtering and user profiles in retrieval 

Browsing as a method of discovery 

Guided discovery - the use of links 

Feedback of representations in discovery and browsing 

Relevance feedback 

Automatic indexing of resources - toolsets and issues 
Interactive interfaces and visualization as feedback 

Value placed on variety in expression in humanities as penalty to retrieval 

Value placed on older sources in humanities as penalty to discovery 

Evolution of concepts over time as penalty to retrieval 

Multilinguality as penalty 

Data acquisition costs in humanities 

Imprecision of audience 

Need to combine multiple approaches and integrated methods 
User perspectives 
Thesaurus merging 

Commentaries, pathfinders, and tools with a point of view 

Levels of knowledge-based access 

Multilingual issues 

Critical mass 

Pattern matching 

Audience analysis/fecdback 

Readers as authors 

Bibliography on search and discovery research 



Pai»kr 4 Conversion of traditional Source Materiai^ into Digitai. Form 

Ki-nnfy 

41 Digital surrogates for papcr/film 

41 Problem that bit maps aren't indexable or searchable 

41 History of digital text surrogacy efforts 
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41 


Purpose of surrogacy in adding value for analysis 


41 


History of digital image surrogacy efforts 


42 


Why digital capture became cheaper in 1990s 


42 


Continuing demand for content/keyv/ord searching 


42 


Research issues posed by large compendia 


42 


Other large-scale projects 


43 


Research issues in the use and impact of digital surrogates 


43 


Capture and quality standards 


43 


Near-term research: benchmarks for quality by purpose 


43 


Near-term issues - evaluation criteria 


43 


Near-term issues - production and throughput 


44 


Color management 


44 


Automatic capture settings 


44 


Image transmission and end-user perception of usability 


45 


Intelligent files 


45 


Pattern matching and object recognition 


45 


Raster-vector conversion and functionality 


45 


Compression research 


45 


Cost-effectiveness 


45 


Automated selection and control 


46 


Business case 


46 


User needs and perceptions 


46 


Display - dramatic improvement needed 


46 


Bibliography on digital capture 


Paper 5 


Image and Multimedia Retrievai. 



Romer 

Lack of both tools and approaches for multimedia cataloging 

Sofware for image databases proprietary and weak on retrieval 

Lack of tradition in image cataloging 

Retrieval results based on data representation 

Non-textually-bascd retrievals and auto-indexing 

Can text-based approach enhance image-based approach? 

Research on what is meant by similarity in different modalities 

Map to languages and symbols 

Formal properties of genres in different modalities 

How to escape from words 

Content attribute identification within images 

State of art still too primitive and domain specific 

User-based search models and points of view research 

Layered questions, layered representations 

Visual thinking thought processes need to be understood 

Likeness as a criterion 

Points of view as future research need 

Evaluation of text-based retrieval results 

Media-based significant attributes need to be identified 

Visual thesaurus functionality 



49 

49 

49 

49 

50 
50 

50 

51 

52 
52 
52 
52 

52 

53 
53 

53 
52f. 

54 
54 
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55 Representing sets of images rather than individual images 

55 Issues in music representation 

55 Motion representation schemes 

55 Use of image and multimedia depends on quality and purpose of their representations 

55 Representations are multiple and acquired throughout life cycle, need coordination 

55 Bibliography on image representation and retrieval research 

Paper 6 Learning and Teaching 

Murray 

57 New medium will make possible new methods of teaching and learning 

57 Used to date in skill-based disciplines - writing and reading 

57 Many early research tools now embodied in commercial sofhvare 

57 Foreign -language instruction, especiaily laboratory and online exams, benefit 

58 New approaches to instruction developed to use communicative capacity of multimedia 

58 Purpose to use teacher in role of task designer rather than sole source of information 

58 Grammai and pronunciation practice still require better technology 

58 Simulations to teach history 

58 Electronic textbook technolog)' and the market 

59 Corpora and rich webs in disciplines and specialties 

59 Hypermedia archives around single authors 

59 Lack of systematic coverage or coordination of hypermedia projects 

59 Need for standard for text management software 

60 Promise of multimedia for media studies affected by legal and delivery issues 

60 Non-linear authoring, for creative writing, needs better tools 

60 Information retrieval and editorial review critical for Internet 

61 Rise of distance learning demands research on how learning takes place 

6 1 Need to explore course-length hypermedia packages 

61 Redesign of classrooms needs research/implemcntation 

61 Evaluation methods 

61 Natural language processing and speech recognition promising for language teaching 

61 Hypermedia authoring and reference environments urgently needed 

61 Creative arts software support for non-lineariry required 



Paper 7 Archiving and AUTHENTicnY 

BhlARMAN 

63 Humanistic studies depend on attribution, sourcing, and context 

63 Long-term intelligibility and usability is a necessity 

63 Proliferation of studies in past few years cited 

63 Preserving bits requires recopying media 

63 IVescrving recordness requires meta-data 

64 Preserving functionality requires robust representations 

64 Cultural and legal concepts of evidence 

64 Current research on preserving bits not very important to humanities 

64 Current research on recordness is critical to humanities 

64 Little current research on preserving functionality 

64 (Ailtural concept of evidence research 

65 Related areas In digitization and knowledge representation 
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65 

65 

65f. 

65 

65 

65 

66 
66 



Future work in meta-data standards for evidence 
Future research in collaborative tools 

Future research needs in migration and dynamic document management 

Future research required in business processes and self-documenting records 

Literary warrant for evidence research 

Issues in use of records 

Problems of scaleabiiity and implementation 

Bibliography of recent electronic archiving research 



Paper 8 New Social and Economic Mechanisms 

Garrfjt 

69 Interplay between technology, scholarship, and society 

69 Traditional roles in scholarship breaking down (creator, user, publisher etc.) 

69 Scholarly publishers, libraries, journals all changing and threatened 

70 Gatekeeper roles blurred 

70 Original and copy, hence the act of creativity itself, is blurred 

70 Possible approach is to manage processes in life cycle of ideas 

71 Resource identification systems 

71 Evaluation, automatic summarization, and integration of sources 

71 Analysis of the characteristics of large databases 

72 Collaboration tools with mechanisms for assigning responsibility and credit 

72 Impact of lowered entry barriers for scholarship/publishing 

72 Need for reliable, standard infrastruett e 

72 Location-independent naming of objects 

72 Registration methods for digital objects 

73 Methods to prevent destruction of last copy/archive copy 

73 Methods to ensure usability of digital objects over long term 

73 Methods to increase precision in searches 

73 Methods to increase recall with and beyond browsing 

73 Dynamic, interactive dialogue in retrieval 

74 Mechanisms to support trailblazers 
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RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE 



Glossary 



A.AT 


Art &: Architecture Thesaurus 


SGML 


Standard Generalized Markup 


AHIP 


The Getty Art History Information 




Language 




Program 


TEI 


Tc 't Encoding Initiative 


ARTFL 


A database of French language and 


TLG 


7 tsaurus Linguae Gmecae 




literature 


URL 


Universal Resource Locator (address 


CETH 


Center for Electronic Texts in the 




on World Wide V/eb) 




Humanities 


USGS 


U.S. Geological Survey 


CHIO 


Cultural Heritage Information 
Online, a CIMI project 


WAIS 


Wide Area Information Server 
World Wide Web 




Web, WWW 


CIMI 


Computer Interchange of Museum 
Information 


XDOD 


Xerox [document system] 


CNI 


Coalition for Networked Information 


Yahoo 


A search engine on the World Wide 
Web 


FAQ 


Frequently Asked Questions 




FIPS 


Federal Information Processing 
Standard 






GIS 


Geographical Information System 






H-Net 


A group of 57 listservs in the human- 








ities 






HTML 


Hypertext Markup Language 






lATH 


Institute for Advanced Technology in 
the Humanities 






ICONCLASS A computer-based system for classify- 








ing iconography 






lOLS 


integrated online librar\' system 






ISO 


International Standards Organization 






LCTGM 


Library of Congress Thesaurus of 
Graphic Materials 






Lycos 


A search engine on the World Wide 
Web 






MARC 


Methodology for Art Reproduction in 
Color (also, Machine Readable 
Cataloging) 






MOO 


Multi-User Dungeon, Object 
Oriented environment 






MTF 


modulation transfer function 






NEH 


National Endowment foi the 
Humanities 






NRG 


National Research Council 






NSF/ARPA 


National Science 
Foundation/ Advanced Research 
Projects Agency 






OCR 


optical character recognition 






PDF 


portable document format 






QBIC 


Qucr\' by Image Content 






RIT 


Rochester Institute of Technology 






RLG 


The Research Libraiies Group 






RUN 


The Research Libraries Information 
Network 
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